Skip to content

esbullington/python-ngram

 
 

Repository files navigation

The ngram module offers string similarity calculation and approximate string matching based on N-Grams.

Here is the documentation annd the tutorial.

How does it work?

The NGram class extents the Python set class with the ability to search for set members ranked by their N-Gram string similarity to the query. There are also methods for comparing a pair of strings.

The set stores arbitrary items by using a specified "key" function to produce the string representation of set members for the n-gram indexing.

N-grams are obtained by splitting strings into overlapping substrings of N (usually N=3) characters in length.o

To find items similar to a query string, it splits the query into N-grams, collects all items sharing at least one N-gram with the query, and ranks the items by score based on the ratio of shared to unshared N-grams between strings.

Credits

The starting point was the Perl String::Trigram module by Tarek Ahmed. In 2007, Michel Albert (exhuma) wrote the ngram module and submitted 2.0.0b2 to Sourceforge. Since late 2008 python-ngram has been developed by Graham Poulter, adding features, documentation, performance improvements and Python 3 support.

About

Set that supports searching by ngram similarity

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.1%
  • Shell 0.9%