Synthetic Dataset Generator (Hindi)

Synthetically create a dataset of images of Devanagari text drawn on images

Prerequisites:

Python 3.5 and above
PIL (Image)
numpy
matplotlib

You will also need to install Pillow’s require external libraries before you run the code. The libraries enables PIL to draw complex fonts like Devanagari on images correctly.

Follow the steps on Pillow's Installation page to install libraries correctly, depending on what system you are using: https://pillow.readthedocs.io/en/stable/installation.html

How to run script:

python3 Generator.py "dataset_size" -lv -v -mt

Use the following args for the following use:

dataset_size -> Number of Images to generate
-h, --help -> Show this help message and exit
-lv, --large_vocab -> Use Large Vocab, Default is Small Vocab
-v, --verbose -> (Verbose) Display progress of generation
-mt, --multithreading -> Use multithreading

You have an option to choose between 2 vocab files, Large_vocab_file.txt and Small_vocab_file.txt. Large vocab consists of 208162 hindi words. Small vocab consists of 9948 hindi words.

This will create "dataset_size" images in the directory /Images, with random backgrounds chosen from the /Backgrounds folder and a random word form the vocab file you choose.

Sample Image:

Ground truths would be stored in /Ground_truths.txt file, in the following format:

"Image_number" "ground_truth"

Example:

20 कबीरा

Which means, कबीरा is drawn on Images/20.jpg file.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Backgrounds		Backgrounds
Fonts		Fonts
Generator.py		Generator.py
Large_hindi_vocab.txt		Large_hindi_vocab.txt
README.md		README.md
Small_hindi_vocab.txt		Small_hindi_vocab.txt
sample.jpg		sample.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backgrounds

Backgrounds

Fonts

Fonts

Generator.py

Generator.py

Large_hindi_vocab.txt

Large_hindi_vocab.txt

README.md

README.md

Small_hindi_vocab.txt

Small_hindi_vocab.txt

sample.jpg

sample.jpg

Repository files navigation

Synthetic Dataset Generator (Hindi)

About

Releases

Packages

Languages

rahul75/SyntheticDatasetGenerator

Folders and files

Latest commit

History

Repository files navigation

Synthetic Dataset Generator (Hindi)

About

Resources

Stars

Watchers

Forks

Languages