GitHub - jaindeepali/Adler: Generating text corpus from TechTC-300 Test Collection for topic classification.

Ready-to-use text classification corpus generation from TechTC-300 Test Collection . The final dataset uses chi-squared feature selection and tf-idf feature weighting. Classification is done using a Decision-Jungle classifier in AzureML .

Clone the repo: git clone https://github.com/jaindeepali/Adler
Create config file from sample: cp Adler/config/sample.config.json Adler/config/config.json
Open config.json and edit the path to the data directory
Create python virtual environment: virtualenv .venv
Activate virtual environment: source .venv/bin/activate
Install Adler package: python setup.py install
Run script to generate dataset: /scripts/generate.py

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
Adler		Adler
config		config
scripts		scripts
.gitignore		.gitignore
README.md		README.md
final_dataset.zip		final_dataset.zip
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback