DATING ITALIAN DOCUMENTS USING BERT

Natural Language Processing course project

How to use the software

Download the dataset from the DaDoEval competition website
Put the training set, test set, and gold file folder into the data folder in this repository.
Install the conda environment with conda env create -f environment.yml. If you don't have conda install it from here.
Run the preprocessing scripts into src/utils to preprocess the data. Example (from the src folder): python -m utils.preprocess_train_data
Train the model with python train.py. The default strategy is Umberto with truncation of the first 512 tokens and embedding derived from the sum of the embedding of the [CLS] token of the last four layers.

Notes

By now the change of parameters is manual and embedded in the code. I'll provide an automatic way of training a given model without directly changing the code.
A GPU is not required but recommended.
On a NVidia Titan X GPU the training took about 10 minutes.

Details

If you want to know more details about the project and the tests, have a look at the report here.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
runs_details		runs_details
src		src
.gitignore		.gitignore
NLP_project_report_Graffieti.pdf		NLP_project_report_Graffieti.pdf
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

runs_details

runs_details

src

src

.gitignore

.gitignore

NLP_project_report_Graffieti.pdf

NLP_project_report_Graffieti.pdf

README.md

README.md

environment.yml

environment.yml

Repository files navigation

DATING ITALIAN DOCUMENTS USING BERT

How to use the software

Notes

Details

About

Releases

Packages

Languages

ggraffieti/Dating-Italian-Documents

Folders and files

Latest commit

History

Repository files navigation

DATING ITALIAN DOCUMENTS USING BERT

How to use the software

Notes

Details

About

Resources

Stars

Watchers

Forks

Languages