Skip to content

katakwar86/dup-image-search

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dup-image-search

This project is to help the Internet Archive find duplicate images for their many images (particularly music album art covers).

Algorithms used

Thanks to http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html for the Simple Hash and pHash (aka perceptive hash, which uses DCT)

  • MD5 checksum (This may change to SHA.)
  • Simple Hash
    Scale to 8x8, greyscale, hash based on above/below average
  • DCT (Discrete Cosine Tranform)
    Scale to 32x32, greyscale, [DCT](https://en.wikipedia.org/wiki/Discrete_cosine_transform), hash based on above/below average excluding top-left "base" value

About

Help find similar duplicate images for the Internet Archive

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 57.1%
  • TeX 41.1%
  • Shell 1.8%