iHashDNA

Perceptual hashing library in python (with redis), a wannabe PhotoDNA

What is Perceptual Hashing

Perceptual hashing is the use of an algorithm that produces a snippet or fingerprint of various forms of multimedia.[1][2] Perceptual hash functions are analogous if features of the multimedia are similar, whereas cryptographic hashing relies on the avalanche effect of a small change in input value creating a drastic change in output value. Perceptual hash functions are widely used in finding cases of online copyright infringement as well as in digital forensics because of the ability to have a correlation between hashes so similar data can be found (for instance with a differing watermark). Based on research at Northumbria University,[3] it can also be applied to simultaneously identify similar contents for video copy detection and detect malicious manipulations for video authentication. The system proposed performs better than current video hashing techniques in terms of both identification and authentication.

Wikipedia, Perceptual Hashing

TLDR: How Perceptual Hashing works

Pic Source: Why we created 'Imageid' and saved 47% of the moderation effort | by Diego Essaya | Taringa! | Medium

Perceptual hashing converts an image, by degrading it and turning it into "pixels", into a binary (or hexadecimal) sequence. Unlike cryptographic hashing, perceptual hashing lacks of avalanche effect, making any change in the image easily perceivable in the hash.

What iHashDNA does

It uses phash and whash by checking initially phash, then whash.

By combining these two with a db (redis), you get this library.

You can:

Ban images: Add the hash of the image to the DB (and checks if already in it). This includes rotations (90 degrees left right 180 up down) of the pictures.
Unban images: Remove the hash and all the similar hashes from DB;
Whitelist images: Ignore a picture hash.

Practical examples

Perceptual hashing is a good way to recognize two similar images. If you need to:

Fast indexing similar images;
Check for prohibited content without saving it into your DB (child pornography, pornography, porn, gore...);
Check for watermarked original copyrighted content.

and more...

The library can easily detect an edited photo if it has:

Color changes;
Random garbage over it (watermarks, stickers....);
slight cropping.

Issues and limitations

Remember that this is not ML-Based.

It can be easily bypassed by cropping the image.

Here you will find an interesting article that evaluates the various functions of perceptual hashing.

This library is a wannabe PhotoDNA.

How to use it

Requirements

Install redis
Start redis
git clone https://github.com/matteounitn/iHashDNA.git
cd into folder
(Optional) create a venv:

python3 -m venv venv && source venv/bin/activate
pip3 install -r requirements.txt

Then you are good to go!

Example

Checkout this example.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.assets		README.assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example.py		example.py
ihashdna.py		ihashdna.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.assets

README.assets

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

example.py

example.py

ihashdna.py

ihashdna.py

requirements.txt

requirements.txt

Repository files navigation

iHashDNA

What is Perceptual Hashing

TLDR: How Perceptual Hashing works

What iHashDNA does

Practical examples

Issues and limitations

How to use it

Requirements

Example

About

Languages

License

matteounitn/iHashDNA

Folders and files

Latest commit

History

Repository files navigation

iHashDNA

What is Perceptual Hashing

TLDR: How Perceptual Hashing works

What iHashDNA does

Practical examples

Issues and limitations

How to use it

Requirements

Example

About

Topics

Resources

License

Stars

Watchers

Forks

Languages