A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition

The repo includes the code for the following paper:

@inproceedings{li2021sodner,
 title={A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition},
 author={Li, Fei and Lin, Zhichao and Zhang, Meishan and Ji, Donghong},
 booktitle={Proceedings of the ACL},
 year={2021}
}

Setup

Use "conda" or "virtualenv" to create a virtual python3 environment. Take "conda" as example, run:

conda create -n sodner python=3.6

Activate the environment.

conda activate sodner

Run the following command to install necessary packages.

pip install -r requirements.txt

Download the PyTorch AllenNLP version of SciBERT from here. Put it into the current directory.
Put the preprocessed data into "data" directory. There is a sample directory for your reference to preprocess original datasets.

Training & Evaluation

Below is the command to run experiments on the sample dataset. If use GPU, change -1 to 0 or other number that is larger than 0.

nohup ./train_sample.sh -1 > sample_0001.log 2>&1 &

Inference

Run the following command.

cuda_device=-1 allennlp predict models/sample_0001/model.tar.gz data/sample/sample.json --include-package sodner --predictor my_predictor --output-file prediction.txt

Debug

Change the settings in "sample_working_example.jsonnet" as below.

debug: true,
shuffle: false,

Add the following environment into your IDE such as PyCharm.

ie_test_data_path=./data/sample/sample.json;
ie_dev_data_path=./data/sample/sample.json;
ie_train_data_path=./data/sample/sample.json;
cuda_device=-1;

Run "debug_sample.py" with debug mode.

Data Preprocessing

We show an example to preprocess the CADEC data. First, download the code of Dai et al. 2020. Use their instructions to preprocess the CADEC data and get 3 output files, namely "train.txt", "dev.txt" and "test.txt".
Download Stanford CoreNLP. We use "stanford-corenlp-full-2018-10-05".
Modify the directory paths at the beginning of "preprocess_cadec.py" based on your environment. Create a "1.sh" file like

#!/bin/bash
sudo /xxx/envs/python37/bin/python "$@"

and run "1.sh preprocess_cadec.py".

Acknowledgement

We thank all the people that provide their code to help us complete this project. This project is built mainly based on the code published by Wadden et al. 2019.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
models		models
sodner		sodner
training_config		training_config
README.md		README.md
debug_sample.py		debug_sample.py
preprocess_cadec.py		preprocess_cadec.py
requirements.txt		requirements.txt
train_ace05.sh		train_ace05.sh
train_cadec.sh		train_cadec.sh
train_clef.sh		train_clef.sh
train_clef_dis.sh		train_clef_dis.sh
train_genia.sh		train_genia.sh
train_sample.sh		train_sample.sh

foxlf823/sodner

Folders and files

Latest commit

History

Repository files navigation

A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition

Setup

Training & Evaluation

Inference

Debug

Data Preprocessing

Acknowledgement

About

Resources

Stars

Watchers

Forks

Languages