GitHub - tttthomasssss/acl2017: Code and resources of the ACL 2017 paper "Improving Semantic Composition with Offset Inference"

IMPORTANT NOTE

The file size of the APT representations used in the paper is too large to host them on GitHub.

Please email the first author of the paper (either using the email address on this GitHub repository or the email address in the paper) if you would like to have them.

Dependencies

The code relies on several 3rd party libraries:

numpy
scipy
scikit-learn
dill
joblib
nltk
sqlitedict

In addition the following code (which has its own dependencies) is necessary for performing distributional inference:

DiscoUtils: https://github.com/tttthomasssss/DiscoUtils

Installation

Apart from DiscoUtils which needs to be installed manually, all requirements as well as the codebase itself can be installed with:

cd path/to/apt-toolkit
pip install -e .

Resources

Vectors from the paper Improving Semantic Composition with Offset Inference will be posted on github - unfortunately they were too large for the ACL submission system.

Usage

Loading vectors:

from apt_toolkit.utils import vector_utils

vectors = vector_utils.load_vector_cache('path/to/vectors', filetype='dill') # Loads the higher-order dependency-typed vectors as a `dict` of `dicts`

Creating Offset Representations:

from apt_toolkit.composition import mozart
from apt_toolkit.distributional_inference import distributional_inference
from apt_toolkit.utils import vector_utils

from discoutils.thesaurus_loader import Vectors

'''
Loads a standard set of vectorised APTs, creates some offsets and adds them back to the lexicon
This is one way - and perhaps the simplest way - of doing creating neighbours for offsets
'''

# Load Vectors
vectors = vector_utils.load_vector_cache('path/to/vectors', filetype='dill')

noun_vector = vectors['quantity']
adj_vector = vectors['large']

# Offset and add the resulting representation back to the lexicon 
offset_vector_adj = vector_utils.create_offset_vector(adj_vector, 'amod')
vectors['__OFFSET_amod_large'] = offset_vector_adj

# Can also offset the noun vector or any other vectors and add them back to the lexicon
offset_vector_noun = vector_utils.create_offset_vector(noun_vector, '!amod')
vectors['__OFFSET_!amod_quantity'] = offset_vector_noun

# Store vectors back to file
vector_utils.save_vector_cache(vectors, 'path/to/offsets', filetype='dill')

Composing Vectors with Offset Inference:

from apt_toolkit.distributional_inference import distributional_inference
from apt_toolkit.utils import vector_utils

# Load Vectors with previously added offset representations
vectors = vector_utils.load_vector_cache('path/to/offsets', filetype='dill')

adj_vector = vectors['exciting']

# Use offset Inference to enrich the noun-view of the adjective "exciting"
exciting_offset = vector_utils.create_offset_vector(adj_vector, 'amod')
rich_exciting_offset = distributional_inference.static_top_n(vectors=vectors, words=['__OFFSET_amod_exciting'], num_neighbours=20)

# Use the standard DI algorithm to infer unobserved co-occurrence features for the noun "book"
rich_book = distributional_inference.static_top_n(vectors=vectors, words=['book'], num_neighbours=20)

# Now the two vectors can be composed
composed_vector = mozart.intersect_apts(rich_exciting_offset, rich_book)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
apt_toolkit		apt_toolkit
ImprovingSemanticCompositionWithOffsetInference_CameraReady.pdf		ImprovingSemanticCompositionWithOffsetInference_CameraReady.pdf
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apt_toolkit

apt_toolkit

ImprovingSemanticCompositionWithOffsetInference_CameraReady.pdf

ImprovingSemanticCompositionWithOffsetInference_CameraReady.pdf

MANIFEST.in

MANIFEST.in

README.md

README.md

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

IMPORTANT NOTE

Dependencies

Installation

Resources

Usage

Loading vectors:

Creating Offset Representations:

Composing Vectors with Offset Inference:

About

Releases

Packages

Languages

tttthomasssss/acl2017

Folders and files

Latest commit

History

Repository files navigation

IMPORTANT NOTE

Dependencies

Installation

Resources

Usage

Loading vectors:

Creating Offset Representations:

Composing Vectors with Offset Inference:

About

Resources

Stars

Watchers

Forks

Languages