Library and scripts to analyze CycIC entangled datasets for the Machine Common Sense (MCS) program.
From the current directory:
python3 -m venv venv
On Unix:
source venv/bin/activate
On Windows
venv\Scripts\activate
pip install -r requirements.txt
Download and expand the CycIC entangled sample dataset (12/1) to data/downloaded/cycic3_sample
.
The sample dataset should consist of:
cycic3_question_links.csv
cycic3a_sample_labels.jsonl
cycic3a_sample_questions.jsonl
cycic3b_sample_labels.jsonl
cycic3b_sample_questions.jsonl
Activate the virtual environment as above, then run:
pytest
- Activate the virtual environment as above
- Run the command:
python3 -m mcs_cycic_analysis create-spreadsheets
The spreadsheet CSV files will be written to data/spreadsheets/cycic3_sample
. This directory will be created if necessary.
The files are intended to be different sheets/tabs in a Google Sheet or Excel file.
The mcs-cycic-analysis
code base consists of:
- a library of models for capturing the CycIC dataset
- a command line interface for reading, writing, and manipulating models
Start by looking at mcs_cycic_analysis.cli.commands.create_spreadsheet_command
as an entry point.