hypernymy extraction

Code for msbd5014/Fa20-Independent project

"Hypernymy Extraction: Extensive Exploration of Several Methods" (report)

This project is conducted under the topic of taxonomy learning in Natural Language Processing (NLP). In the project, I mainly made an attempt to extract hypernymy relation via two directions, pattern based (mostly unsupervised) and learning based (mostly supervised). Pattern based methods start from manual hearst patterns, so that techniques like PPMI and SVD are applied for further detection. At the same time, distributional methods based on several hypernymy similarity measures are also explored to compare with those pattern based ones. In supervised learning models, I mainly explored term embedings for SVM hypernymy classification, projection learning and Bi-LSTM sequence labeling models. Evaluation results of those learning models all achieve an f1 score higher than 0.7.

Requirements

Python 3 is required (3.7 is preferred).

Dependent packages include numpy, pandas, matplotlib, sklearn, scipy, pytorch, gensim, nltk.

commands.txt gives the list of commands about how to start.

Demo

See major hypernymy extraction results under ./Results folder, but only evaluation results are shown in report.

Part of evaluation scores are shown as follows.

Performance

The code covers six parts: a. Manual hearst patterns; b. Pattern-based methods; c. Distributional methods; d. Term embeddings; e. Projection learning; f. Bi-LSTM sequence labeling.

Each module mainly consists of data, preprocess, model, evaluator parts. An Untitled.ipynb under each part gives a simple check how each module works.

Training

Here are some selected training procedure.

Projection learning training loss (left)
Bi-LSTM sequence labeling model training score (right)

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
BiLSTM		BiLSTM
Results		Results
__pycache__		__pycache__
_files		_files
distributional		distributional
patternBased		patternBased
projection		projection
termEmbed		termEmbed
README.md		README.md
commands.txt		commands.txt
hearst.py		hearst.py
main.py		main.py
test.hearst.txt		test.hearst.txt
utils.py		utils.py

WillaFan/hypernymy-extraction

Folders and files

Latest commit

History

Repository files navigation

hypernymy extraction

Requirements

Demo

Performance

Training

About

Resources

Stars

Watchers

Forks

Languages