Skip to content

zeyaddeeb/abydos

 
 

Repository files navigation

Abydos

Build Status

Coverage Status

Code Climate

Code Health

Known Vulnerabilities

Scrutinizer

AppVeyor

Codacy

Requirements Status

Pylint Score

pycodestyle Errors

flake8 Errors

PyPI

PyPI versions

conda-forge

conda-forge downloads

conda-forge platforms

Documentation Status

'Waffle.io - Columns and their card count'

License: GPL v3

OpenHUB

abydos


Abydos NLP/IR library
Copyright 2014-2018 by Chris Little

Abydos is a library of phonetic algorithms, string distance metrics, stemmers, and keyers, including:

  • Phonetic algorithms
    • Robert C. Russell's Index
    • American Soundex
    • Refined Soundex
    • Daitch-Mokotoff Soundex
    • Kölner Phonetik
    • NYSIIS
    • Match Rating Algorithm
    • Metaphone
    • Double Metaphone
    • Caverphone
    • Alpha Search Inquiry System
    • Fuzzy Soundex
    • Phonex
    • Phonem
    • Phonix
    • SfinxBis
    • phonet
    • Standardized Phonetic Frequency Code
    • Statistics Canada
    • Lein
    • Roger Root
    • Beider-Morse Phonetic Matching
  • String distance metrics
    • Levenshtein distance (incl. a [0, 1] normalized variant)
    • Optimal String Alignment distance (incl. a [0, 1] normalized variant)
    • Levenshtein-Damerau distance (incl. a [0, 1] normalized variant)
    • Hamming distance (incl. a [0, 1] normalized variant)
    • Tversky index
    • Sørensen–Dice coefficient & distance
    • Jaccard similarity coefficient & distance
    • overlap similarity & distance
    • Tanimoto coefficient & distance
    • Minkowski distance & similarity (incl. a [0, 1] normalized option)
    • Manhattan distance & similarity (incl. a [0, 1] normalized option)
    • Euclidean distance & similarity (incl. a [0, 1] normalized option)
    • Chebyshev distance & similarity (incl. a [0, 1] normalized option)
    • cosine similarity & distance
    • Jaro distance
    • Jaro-Winkler distance (incl. the strcmp95 algorithm variant)
    • Longest common substring
    • Ratcliff-Obershelp similarity & distance
    • Match Rating Algorithm similarity
    • Normalized Compression Distance (NCD) & similarity
    • Monge-Elkan similarity & distance
    • Matrix similarity
    • Needleman-Wunsch score
    • Smither-Waterman score
    • Gotoh score
    • Length similarity
    • Prefix, Suffix, and Identity similarity & distance
    • Modified Language-Independent Product Name Search (MLIPNS) similarity & distance
    • Bag distance (incl. a [0, 1] normalized variant)
    • Editex distance (incl. a [0, 1] normalized variant)
  • Stemmers
    • the Lovins stemmer
    • the Porter and Porter2 (Snowball English) stemmers
    • Snowball stemmers for German, Dutch, Norwegian, Swedish, and Danish
    • CLEF German, German plus, and Swedish stemmers
    • Caumann's German stemmer
  • Keyers
    • string fingerprint
    • q-gram fingerprint
    • phonetic fingerprint
    • skeleton key
    • omission key

Required:

  • Numpy

Recommended:

  • PylibLZMA (Python 2 only--for LZMA compression string distance metric)

Suggested for development, testing, & QA:

  • Nose (for unit testing)
  • coverage.py (for code coverage checking)
  • Pylint (for code quality checking)
  • PEP8 (for code quality checking)

Installation

To install Abydos from PyPI using pip:

pip install abydos

It should run on Python 2.7 and Python 3.3+

To build/install/unittest from source in Python 2:

sudo python setup.py install
nosetests -v --with-coverage --cover-erase --cover-html --cover-branches --cover-package=abydos .

To build/install/unittest from source in Python 3:

sudo python3 setup.py install
nosetests3 -v --with-coverage --cover-erase --cover-html --cover-branches --cover-package=abydos .

For pylint testing, run:

pylint --rcfile=pylint.rc abydos > pylint.log

A simple shell script is also included, which will build, install, test, and code-quality check (with Pylint & PEP8) the package and build the documentations. To run it, execute:

./btest.sh

About

Abydos NLP/IR library for Python

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.9%
  • Shell 0.1%