Pythia is Lab41's exploration of approaches to novel content detection. We are interested in making it easier to tell when a document coming into a corpus has something new to say. We welcome your contributions (see our contributor guidelines) and attention.
You can get started very quickly on a system with Docker using the following commands to pull our publicly available image and train an XGBoost model on the sample data that comes with the repository:
docker pull lab41/pythia
docker run -it lab41/pythia experiments/experiments.py with XGB=True BOW_APPEND=True BOW_PRODUCT=True
docker build -t lab41/pythia . # runs tests and builds project image
Our code is written in Python 3. envs/make_envs.sh will install the necessary dependencies on a Debian/Ubuntu system with Anaconda installed.