Break simple captcha with image preprocessing using openCV on python with tesseract
- Test various preprocessing procedures
- Recognize text via tesseract
- Compute accuracy of text recognition on given dataset
- Can read an image from local directoy by entering full path, or download an image from a link by entering a url
- If the path or link is wrong, it will download an image from the default link
- There are 4 preprocessing steps you can choose from
- Enter steps you wish to include in your processing, in wanted order
- The images after each step, including the original image, will show up, as well as the text read from tesseract
- You must provide a full dataset to verify the accuracy of preprocessing. This means the dataset must include images in .png format, and their titles must be the target label of self
- The current path in which the program is running is given, in case your dataset is in the same directory.
- After entering your dataset, you can choose which steps you want to take.
- If your label contains multiple letters, the former accuracy indicates the ratio of correct letters to total letters.
- If you're concerned with actual accuracy, the later indicates the ratio of correctly recognized captcha texts.
- The result of testing binarisation is shown immediately with a random image.
- The result will show 4 images, including the original image that was processed.
- 'simple binary 125' is the result of using simple thresholding with the threshold vlue of 125. 'simple binary' is the result of simple thresholding, but with the threshold value chosen from the image. 'adaptive' is the result of using adaptive thresholding.
- The result of testing morphology is shown immediately with a random image.
- The result will show 4 images, including the original image that was processed.
- 'dilation' is the result of dilation, in which the area of white pixels increase. 'erosion' is the result of erosion, in which the area of black pixels increase. 'closed' is the result of closing, in which images are processed with dilation and then erosion in that order.
- The result of testing blurring is shown immediately with a random image.
- The result will show 4 images, including the original image that was processed.
- Each image is processed with the filter given in the description
- End operation
- operation must be chosen (1, 2, 3, 4, 5)
- path and order are optional arguments -> default is given
- for operation 2(computing success rate), path to dataset must be given
-
python CaptchaBreaker.py 1
: operation 1 with no optional arguments-> Process random image from default link in default processing order(binarise -> crop -> close -> blur)
-
python CaptchaBreaker.py 1 --order 132
: operation 1 with order arugment-> Process random image from default link in given processing order(binarise -> close -> crop)
-
python Captchabreaker.py 2 --path \Users\argos\PycharmProjects\CaptchaBreak\images --order 123
: operation 2 with path and order arguments-> Process images given in dataset by path in given processing order(binarse -> crop -> close), and print recognition success rate
-
python Captchabreaker.py 3 --path \Users\argos\PycharmProjects\CaptchaBreak\images\003369.png
: operation 3 with path argument-> Test binarisation on given image