Skip to content

steven-s/text-shingles

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Shingles Library

This is python 3 library to support measuring the similarity of pieces of text based on their MinHash signature generated from their k-shingle form.

API

Text can be represented in MinHash form by creating a new ShingledText instance and passing in text as well as optional values for the random_seed for hashing (default 5), the shingle_length aka the k in k-shingles (default 5), and the minhash_size for the size of the MinHash signature (default 200). Variables for the list form of the minhash and iterator representation of shingles are available for the object. A similarity function is also available to compute the Jaccard similarity of the two MinHash objects.

Requirements

This library utilizes Python 3, NLTK, and Murmur Hash

About

k-shingling for text to help compare similarity

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published