Skip to content
/ COLI Public

Cross-lingual offensive language identification project for NLP course @ FRI

Notifications You must be signed in to change notification settings

ghajduk3/COLI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NLP - Cross lingual offensive language identification

authors: Gojko Hajdukovic, Simon Dimc, 05.2021

Table of contents:

  1. Setup
  2. Usage

Setup

These instructions assume that the user is in repo's root.

cd <repo_root>
  1. In order to set-up virtual environment issue:
python -m venv venv
#Activate the environment
source venv/bin/activate
  1. To install all project related dependencies issue:
pip3 install -r requirements.txt
python -m spacy download en_core_web_sm
  1. Get datasets: Datasets are in folder data/source_data. Get datasets from following sources and put them into folders:

  2. Get models:

Usage

The project is structured to implement multiple classifiers for two classification tasks, a binary and multiclass. In order to reproduce results from the report a CLI application has been implemented. Following instructions assume that the user is in project's root.

  1. In order to run CLI application with help description issue:
python main.py --help
  1. Examples:
python main.py --prepareData true --type multi --model LR
python main.py -pd false -t bi -m BERT
  1. For BERT fine-tuning you can use the notebooks/bert-notebook.ipynb notebook for Google Colab.

About

Cross-lingual offensive language identification project for NLP course @ FRI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published