Skip to content

barrycarey/Imgur-Repost-Detection-Bot

Repository files navigation

Imgur Repost Detection Bot

This Python script acts as an Imgur bot that detects reposted content.

While running it pulls all new images from Usersub, calculates a hash (using Dhash for 16, 64, and 256bit hashes), stores it in a MySQL Database.

All hashes for new images are added to a queue. This queue is handled via a process pool. The amount of processes is configurable via the ini.

alt text

How Does It Detect Reposted Content?

For each image a hash is generated using the Dhash algorithm and stored in the database.

We can then check the hash of new images against existing hashes using hamming distance. If the hamming distance is less than the threshold set in the config it is flagged as a repost.

Configuration

In the bot.ini enter your Imgur API details along with your MySQL details.

A template MySQL file can be found in the sql folder.

Once ready run ImgurRepostBot.py

Notable Features

  • Automatic API rate limiting. It continually checks your remaining credits and the time until they reset. It then adjusts the request delay to fit within that time.
  • Backfill Database. This allows the bot to work backwards through usersub pages while still getting the newest images. This allows you to backfill your database. You can set the starting page and depth via the ini.
  • Change process pool size. This allows you to tweak how much CPU is used while comparing hashes for reposts. Large hashes are CPU intensive.
  • Configurable hash size and hamming distance allows you to tweak the accuracy of repost detections.
  • Enable / Disable Automatic Downvote and Comment via bot.ini
  • Modify settings in the .ini file while the bot is running
  • Auto Retry failed comments and downvotes. If Imgur is over capacity they will be saved and tried again later

Comment Template Usage

You can specify a custom comment template via the bot.ini file. This allows you to use {value} as place holders for values in the comment.

Available Values Are:

  • {count} - Print the number of matching images
  • {g_url} - Prints the gallery URL of oldest match
  • {d_url} - Prints the direct URL of the oldest match
  • {submitted_epoch} - Prints the epoch timestamp of oldest match
  • {submitted_human} - Prints a the human readable date of oldest match
  • {user} - Prints the user of the oldest match

Example: Image submitted {count} times. First seen {submitted_human}

Required Libraries

Disclaimer

This is a WIP: I'm still adding stuff and messing around with it

I'm not a professional programmer, more of a hobbyist. Due to this the code may not be the cleanest. I welcome suggestions and pull requests.

I made this bot as coding practice.

I'm aware the functionality is similar to the RepostStatistics.

Use At Your Own Risk

About

A bot to detect reposted content on Imgur

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages