Skip to content

ceramisch/minimantics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Minimalist distributional semantics by Carlos Ramisch Sept 11, 2014

IMPORTANT: build_profiles.c had a serious and mysterious bug, corrected on Jun 1 2015, please update ASAP!

Requires NLTK wordnet for evaluation Requires GNU C compiler with pthreads support (standard in Linux)

We will run on example mini.1, containing verb-noun pairs as extracted from the corpus, no filtering or counting, no sorting.

Does not support triple-based measures as standard Lin.

Running:

# compile the programs
make                        

# Filter out verbs/nouns freq < 10, pairs freq < 2
./filterRaw.sh mini.1 10 2   

# Calculate target-context association scores
./build_profiles mini.1.s.filter.t10.c10.tc2.u > mini.1.profiles

# Find distributional neighbors (very fast, loads of options). Use -h for doc
./calculate_similarity -S 0.2 mini.1.profiles > mini.1.sim-th0.2

# Evaluate based on Wordnet, Moby, Levin, etc.
cut -d "\t" -f 1,2,3,4,5 mini.1.sim-th0.2 | tail -n +2 >  mini.1.sim-th0.2.cosine ./evalAll-perverb.sh mini.1.sim-th0.2.cosine # -> HAS BUGS

# Generalize contexts of similar targets, get new pairs. Use -h for doc
./generalize -S 0.5 mini.1.sim-th0.2 mini.1.profiles -> NOT FINISHED

Minimantics also provides some scripts for evaluation of the output thesaurus:

# Adding a "wordnet path_similarity" column
./minimantics-sort-output.sh mini.1.sim-th0.2 | head -n 100 | ./add_wnpath.py -k10 v >mini.1.wnpath

# Eval and print averages for the 'wnpath' column
cat mini.1.wnpath | ./csv_statistics.py 'wnpath' -d target --print-global

# Taking a minimantics-style CSV and using the `cosine` field to solve TOEFL
cat wbst-nanews.v.test | ./solve_toefl.py -a mini.1.sim-th0.2 'cosine'

# Checking how TOEFL is being solved
head wbst-nanews.v.test | ./solve_toefl.py -as mini.1.sim-th0.2 'cosine'

# Seeing for which lines we have data in wbst-nanews.v.test
cat wbst-nanews.v.test | ./solve_toefl.py -s mini.1.sim-th0.2 'cosine' | grep $'\t[^?]'

About

A minimal package of C programs to build classical distributional semantic models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published