GitHub

We have modified their code to use our data and a Bert-based encoder instead of embeddings and lstm.

Changes are in the adapted* files.

CHANGELOG: adapted_main.py * removed many hyperparams that no longer make sense, added other for bert, removed downsampling of data *

adapted_reader.py * basically switched out with our partial_jsonl dataset reader

Run commands:

python adapted_main.py\
  --device "cuda:0"\
  --train-file ../data/conll2003/eng/entity.train.jsonl\
  --dev-file ../data/conll2003/eng/entity.dev.jsonl\
  --test-file ../data/conll2003/eng/entity.test.jsonl\
  --learning_rate "2e-5"\
  --l2 0.0\
  --num_epochs 10\
  --model_folder eng-c_debug\
  --bert-model roberta-base



## Better Modeling of Incomplete Annotation for Named Entity Recognition 

This repository implements an LSTM-CRF model for named entity recognition. The model is same as the one by [Lample et al., (2016)](http://www.anthology.aclweb.org/N/N16/N16-1030.pdf) except we do not have the last `tanh` layer after the BiLSTM.
The code provided is used for the paper "[Better Modeling of Incomplete Annotation for Named Entity Recognition](http://www.statnlp.org/research/ie/zhanming19naacl-ner.pdf)" published in 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (*NAACL*).

__NOTE: To extend a more general use case, a PyTorch version is implemented in this repo. The previous implementation using DyNet can be found in first release [here](https://github.com/allanj/ner_incomplete_annotation/tree/aa20c015b3f373ac4a1893e629ac8f2dd137faab). Right now, I have implemented a the "hard" approach as in the paper. "Soft" approach would be coming soon.__

Our codebase is built based on the [pytorch LSTM-CRF](https://github.com/allanj/pytorch_lstmcrf) repo.

### Requirements
* PyTorch >= 1.1
* Python 3

Put your dataset under the data folder. You can obtain the `conll2003` and `conll2002` datasets from other sources. We have put our collected industry datasets `ecommerce` and `youku` under the data directory. 

Also, put your embedding file under the data directory to run. You need to specify the path for the embedding file.

### Running our approaches
```bash
python3 main.py --embedding_file ${PATH_TO_EMBEDDING} --dataset conll2003 --variant hard

Change hard to soft for our soft variant. (This version actually also supports using contextual representation. But I'm still testing during this weekend.)

Future Work

add soft approach
add other baselines.

Citation

If you use this software for research, please cite our paper as follows:

The implementation in our paper is implemented with DyNet. Check out our previous release.

@inproceedings{jie2019better,
  title={Better Modeling of Incomplete Annotations for Named Entity Recognition},
  author={Jie, Zhanming and Xie, Pengjun and Lu, Wei and Ding, Ruixue and Li, Linlin},
  booktitle={Proceedings of NAACL},
  year={2019}
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
common		common
config		config
data		data
model		model
.gitignore		.gitignore
README.md		README.md
adapted_main.py		adapted_main.py
hard.py		hard.py
main.py		main.py
run_stuff.sh		run_stuff.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common

common

config

config

data

data

model

model

.gitignore

.gitignore

README.md

README.md

adapted_main.py

adapted_main.py

hard.py

hard.py

main.py

main.py

run_stuff.sh

run_stuff.sh

Repository files navigation

Future Work

Citation

About

Releases

Packages

Languages

teffland/ner_incomplete_annotation

Folders and files

Latest commit

History

Repository files navigation

Future Work

Citation

About

Resources

Stars

Watchers

Forks

Languages