Infographics - Dataset Creation

Google Drive Link: https://drive.google.com/drive/folders/1JAPL8ZMNFl7CP-gI3cYdCXtI0__api4v?usp=sharing Scraper reference: https://github.com/geduldig/TwitterGeoPics

Project structure

The infographics_dataset_collection folder has the following structure:

code/

a. scraper.ipynb = Notebook to scrape infographics from twitter

b. classifier.ipynb = Notebook to train ResNet50 CNN classifier

c. predictions.ipynb = Notebook to predict labels from test data

data/

a. raw/ = Images downloaded from twitter using the scraper

b. train/ = Labeled images('info', 'notinfo')

c. test/ = Unlabeled images

models/ = To store trained models used for predictions

a. infographics-classifier.pth = using ResNet50

output/

a. predictions.csv = Model predictions in the format (filename, label)

b. info/ = Folder containing all images from the test set labelled as 'info' by the model

twitter_auth.conf = Twitter Authentication credentials for scraping (Steps for tokens- https://gist.github.com/varunchaudharycs/2af83ccb19a03265a24c4942e8248c3c)
TwitterGeoPics/ = Scraper used

Dependencies

Packages required:

numpy
torch
torchvision
Fridge
tweepy
pygeocoder
tzwhere
TwitterAPI
os
csv
PIL

Environment variables(paths) set in respective notebooks

Steps

Run scraper notebook to populate raw data
Manual labelling of raw data to forumulate train set(segregating images into data/train/info and data/train/notinfo/)
Run classifier notebook to train classifier on labelled data
Run predictions notebook to label the unlabelled images in data/test/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TwitterGeoPics

TwitterGeoPics

code

code

data

data

models

models

output

output

README.md

README.md

bonus-dataset-epochs.png

bonus-dataset-epochs.png

twitter_auth.conf

twitter_auth.conf

Repository files navigation

Infographics - Dataset Creation

Project structure

Dependencies

Steps

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
TwitterGeoPics		TwitterGeoPics
code		code
data		data
models		models
output		output
README.md		README.md
bonus-dataset-epochs.png		bonus-dataset-epochs.png
twitter_auth.conf		twitter_auth.conf

varunchaudharycs/infographics_dataset_collection

Folders and files

Latest commit

History

Repository files navigation

Infographics - Dataset Creation

Project structure

Dependencies

Steps

About

Resources

Stars

Watchers

Forks

Languages