Repository for Team BIASES
This section describes how to clone the repository and install the dependencies for the project.
- Make sure Python 3 and Git are installed.
- Open a terminal and clone this repository with
git clone https://github.com/teambiases/team-biases.git
. - Enter the team-biases directory and run
pip3 install -r requirements.txt
. This should install all the python libraries needed.
This section contains information about how the files and packages in this project are laid out.
The main directories and files are as follows:
src-python
—this directory contains the bulk of the Python code in thebiases
package. See the section package layout for more information about how the code is laid out.scripts
—this directory contains scripts meant to be run directly from the command line. It also contains the_path_config
module, which when included at the top of a script file configures thePYTHONPATH
to allow thebiases
package to be included.README.md
—the file you're reading. It contains basic information about the project..gitignore
—used by git to know what types of files it should ignore (for instance, compiled python files). More information here.requirements.txt
—a list of python libraries in the PyPI repository that are requirements for the project. More information here.
The packages and modules in src-python
are all located under an overarching biases
package. To learn more about python modules, read this. These are the current packages in src-python
:
biases.bias
—bias detection codebiases.wiki
—tools for working with Wikipediabiases.utils
—various utilies in areas such as math or databases
This section describes how to build the results from Wikipedia dump files.
- Make a directory
data/wikipedia/dump
and download the following files into that directory:
- From https://dumps.wikimedia.org/enwiki/20170901/, download
enwiki-20170901-pages-articles.xml.bz2
. - From https://dumps.wikimedia.org/ruwiki/20170901/, download
ruwiki-20170901-pages-articles.xml.bz2
. - From https://dumps.wikimedia.org/eswiki/20170901/, download
eswiki-20170901-pages-articles.xml.bz2
andeswiki-20170901-langlinks.sql.gz
.
- From the
team-biases
directory, runmake topicscorpus
. This will probably take ~24 hours to run. - Run
python3 scripts/topics_demo.py data/wikipedia/corpus/coldwar.es-en-ru-wiki-20170901.400topics.pickle
. If everything worked, a web page should pop up where you can inspect the topic distributions of various articles!