Image Preprocessing and Breaking Captcha

Break simple captcha with image preprocessing using openCV on python with tesseract

Test various preprocessing procedures
Recognize text via tesseract
Compute accuracy of text recognition on given dataset

Interactive version

Command line version

Further report

Interactive Version

1. Preprocess image

Can read an image from local directoy by entering full path, or download an image from a link by entering a url
If the path or link is wrong, it will download an image from the default link

There are 4 preprocessing steps you can choose from
Enter steps you wish to include in your processing, in wanted order
The images after each step, including the original image, will show up, as well as the text read from tesseract

2. Show success rate for dataset

You must provide a full dataset to verify the accuracy of preprocessing. This means the dataset must include images in .png format, and their titles must be the target label of self
The current path in which the program is running is given, in case your dataset is in the same directory.
After entering your dataset, you can choose which steps you want to take.
If your label contains multiple letters, the former accuracy indicates the ratio of correct letters to total letters.
If you're concerned with actual accuracy, the later indicates the ratio of correctly recognized captcha texts.

3. Test Binarisation

The result of testing binarisation is shown immediately with a random image.
The result will show 4 images, including the original image that was processed.
'simple binary 125' is the result of using simple thresholding with the threshold vlue of 125. 'simple binary' is the result of simple thresholding, but with the threshold value chosen from the image. 'adaptive' is the result of using adaptive thresholding.

4. Test Morphology

The result of testing morphology is shown immediately with a random image.
The result will show 4 images, including the original image that was processed.
'dilation' is the result of dilation, in which the area of white pixels increase. 'erosion' is the result of erosion, in which the area of black pixels increase. 'closed' is the result of closing, in which images are processed with dilation and then erosion in that order.

5. Test Blurring

The result of testing blurring is shown immediately with a random image.
The result will show 4 images, including the original image that was processed.
Each image is processed with the filter given in the description

6. Quit

End operation

Command Line Version

operation must be chosen (1, 2, 3, 4, 5)
path and order are optional arguments -> default is given
- for operation 2(computing success rate), path to dataset must be given

Examples

python CaptchaBreaker.py 1 : operation 1 with no optional arguments

-> Process random image from default link in default processing order(binarise -> crop -> close -> blur)
python CaptchaBreaker.py 1 --order 132 : operation 1 with order arugment

-> Process random image from default link in given processing order(binarise -> close -> crop)
python Captchabreaker.py 2 --path \Users\argos\PycharmProjects\CaptchaBreak\images --order 123 : operation 2 with path and order arguments

-> Process images given in dataset by path in given processing order(binarse -> crop -> close), and print recognition success rate
python Captchabreaker.py 3 --path \Users\argos\PycharmProjects\CaptchaBreak\images\003369.png : operation 3 with path argument

-> Test binarisation on given image

Other examples

python CaptchaBreaker.py 2 : operation 2 without path to dataset

-> error due to empty dataset
python CaptchaBreaker.py 4 --order 13483 : operation 4 with order argument

-> no error. the program will operate successfully while ignoring the order argument

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
CaptchaBreaker_cmd		CaptchaBreaker_cmd
CaptchaBreaker_interactive		CaptchaBreaker_interactive
doc_image		doc_image
images		images
LICENSE		LICENSE
README.md		README.md
report.md		report.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CaptchaBreaker_cmd

CaptchaBreaker_cmd

CaptchaBreaker_interactive

CaptchaBreaker_interactive

doc_image

doc_image

images

images

LICENSE

LICENSE

README.md

README.md

report.md

report.md

requirements.txt

requirements.txt

Repository files navigation

Image Preprocessing and Breaking Captcha

Interactive Version

1. Preprocess image

2. Show success rate for dataset

3. Test Binarisation

4. Test Morphology

5. Test Blurring

6. Quit

Command Line Version

Examples

Other examples

About

Releases

Packages

Languages

License

amisha-w/captcha_image_preprocess

Folders and files

Latest commit

History

Repository files navigation

Image Preprocessing and Breaking Captcha

Interactive Version

1. Preprocess image

2. Show success rate for dataset

3. Test Binarisation

4. Test Morphology

5. Test Blurring

6. Quit

Command Line Version

Examples

Other examples

About

Resources

License

Stars

Watchers

Forks

Languages