TensorFlow Input Pipelines

Use these TensorFlow(v0.11) pipelines to automatically download and easily fetch batches of data and labels from some of the most used datasets in Deep Learning. The implementations are threaded, efficient, can be randomized and also include large datasets such as imagenet.

Supported Datasets

MNIST
CIFAR-10
CIFAR-100
SVHN
Stanford Cars 196
Imagenet (no automatic data download, but a shell script is provided in utils/imagenet_download/)
Penn Treebank

(more datasets will be added soon ...)

Example

import tensorflow as tf
sess = tf.Session()

with tf.device('/cpu:0'):
  from datasets.svhn import svhn_data
  d = svhn_data(batch_size=256, sess=sess)
  image_batch_tensor, target_batch_tensor = d.build_train_data_tensor()

for i in range(5):
  print("batch ", i)
  image_batch, target_batch = sess.run([image_batch_tensor, target_batch_tensor])
  # logits = model(image_batch, target_batch)
  # ...
  print(image_batch.shape)
  print(target_batch.shape)
  
d.close()
sess.close()

Installation and Running the Cifar-100 Example

schlag@box:~/MyStuff/input_pipelines$ mkvirtualenv $(pwd | awk '{print $1"/env"}')
Using base prefix '/usr'
New python executable in /home/schlag/MyStuff/input_pipelines/env/bin/python3
Also creating executable in /home/schlag/MyStuff/input_pipelines/env/bin/python
Installing setuptools, pip, wheel...done.
schlag@box:~/MyStuff/input_pipelines$ source env/bin/activate
(env) schlag@box:~/MyStuff/input_pipelines$ pip3 install -r pip3_requirements.txt 
Collecting numpy==1.11.2 (from -r pip3_requirements.txt (line 1))
  Using cached numpy-1.11.2-cp35-cp35m-manylinux1_x86_64.whl
Collecting pickleshare==0.7.4 (from -r pip3_requirements.txt (line 2))
  Using cached pickleshare-0.7.4-py2.py3-none-any.whl
Collecting protobuf==3.0.0 (from -r pip3_requirements.txt (line 3))
  Using cached protobuf-3.0.0-py2.py3-none-any.whl
Collecting scipy==0.18.1 (from -r pip3_requirements.txt (line 4))
  Using cached scipy-0.18.1-cp35-cp35m-manylinux1_x86_64.whl
Collecting six==1.10.0 (from -r pip3_requirements.txt (line 5))
  Using cached six-1.10.0-py2.py3-none-any.whl
Requirement already satisfied: setuptools in ./env/lib/python3.5/site-packages (from protobuf==3.0.0->-r pip3_requirements.txt (line 3))
Installing collected packages: numpy, pickleshare, six, protobuf, scipy
Successfully installed numpy-1.11.2 pickleshare-0.7.4 protobuf-3.0.0 scipy-0.18.1 six-1.10.0
(env) schlag@box:~/MyStuff/input_pipelines$ pip3 install ../tf-builds/tensorflow-0.11.0rc2-cp35-cp35m-linux_x86_64.whl 
Processing /home/schlag/MyStuff/tf-builds/tensorflow-0.11.0rc2-cp35-cp35m-linux_x86_64.whl
Requirement already satisfied: wheel>=0.26 in ./env/lib/python3.5/site-packages (from tensorflow==0.11.0rc2)
Requirement already satisfied: six>=1.10.0 in ./env/lib/python3.5/site-packages (from tensorflow==0.11.0rc2)
Collecting protobuf==3.1.0 (from tensorflow==0.11.0rc2)
  Using cached protobuf-3.1.0-py2.py3-none-any.whl
Requirement already satisfied: numpy>=1.11.0 in ./env/lib/python3.5/site-packages (from tensorflow==0.11.0rc2)
Requirement already satisfied: setuptools in ./env/lib/python3.5/site-packages (from protobuf==3.1.0->tensorflow==0.11.0rc2)
Installing collected packages: protobuf, tensorflow
  Found existing installation: protobuf 3.0.0
    Uninstalling protobuf-3.0.0:
      Successfully uninstalled protobuf-3.0.0
Successfully installed protobuf-3.1.0 tensorflow-0.11.0rc2
(env) schlag@box:~/MyStuff/input_pipelines$ python cifar-100_example.py 
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so.8.0.27 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so.5.1.5 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so.8.0.27 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so.8.0.27 locally
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:05:00.0
Total memory: 7.92GiB
Free memory: 6.63GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:05:00.0)
Loading CIFAR-100 data
- Download progress: 100.0%
Download finished. Extracting files.
Extracting finished. Cleaning up.
Done.
Loading data: data/CIFAR-100/cifar-100-python/train
batch  0
(256, 32, 32, 3)
(256, 100)
batch  1
(256, 32, 32, 3)
(256, 100)
batch  2
(256, 32, 32, 3)
(256, 100)
batch  3
(256, 32, 32, 3)
(256, 100)
batch  4
(256, 32, 32, 3)
(256, 100)
batch  5
(256, 32, 32, 3)
(256, 100)
batch  6
(256, 32, 32, 3)
(256, 100)
batch  7
(256, 32, 32, 3)
(256, 100)
batch  8
(256, 32, 32, 3)
(256, 100)
batch  9
(256, 32, 32, 3)
(256, 100)
done!

Download the Imagenet Data

You need to use the supplied shell script in order to download the imagenet data. This can take a long time. The train archive is almost 150GB in size.

(env) schlag@box:~/MyStuff/input_pipelines$ cd utils/imagenet_download/
(env) schlag@box:~/MyStuff/input_pipelines/utils/imagenet_download$ sh run_me.sh
** snip (this will take a while)  **
(env) schlag@box:~/MyStuff/input_pipelines/utils/imagenet_download$ cd ../../
(env) schlag@box:~/MyStuff/input_pipelines$ python imagenet_example.py
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so.8.0.27 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so.5.1.5 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so.8.0.27 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so.8.0.27 locally
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.7335
pciBusID 0000:05:00.0
Total memory: 7.92GiB
Free memory: 6.61GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:05:00.0)
Successfully read 615299 bounding boxes across 544546 images.
Determining list of input files and labels from data/imagenet/validation/.
Finished finding files in 100 of 1000 classes.
Finished finding files in 200 of 1000 classes.
Finished finding files in 300 of 1000 classes.
Finished finding files in 400 of 1000 classes.
Finished finding files in 500 of 1000 classes.
Finished finding files in 600 of 1000 classes.
Finished finding files in 700 of 1000 classes.
Finished finding files in 800 of 1000 classes.
Finished finding files in 900 of 1000 classes.
Finished finding files in 1000 of 1000 classes.
Found 50000 JPEG files across 1000 labels inside data/imagenet/validation/.
Determining list of input files and labels from data/imagenet/train/.
Finished finding files in 100 of 1000 classes.
Finished finding files in 200 of 1000 classes.
Finished finding files in 300 of 1000 classes.
Finished finding files in 400 of 1000 classes.
Finished finding files in 500 of 1000 classes.
Finished finding files in 600 of 1000 classes.
Finished finding files in 700 of 1000 classes.
Finished finding files in 800 of 1000 classes.
Finished finding files in 900 of 1000 classes.
Finished finding files in 1000 of 1000 classes.
Found 1281167 JPEG files across 1000 labels inside data/imagenet/train/.
Loading imagenet data
Train directory seems to exist
Validation directory seems to exist
batch  0
(64, 299, 299, 3)
(64, 1000)
batch  1
(64, 299, 299, 3)
(64, 1000)
batch  2
(64, 299, 299, 3)
(64, 1000)
batch  3
(64, 299, 299, 3)
(64, 1000)
batch  4
(64, 299, 299, 3)
(64, 1000)
batch  5
(64, 299, 299, 3)
(64, 1000)
batch  6
(64, 299, 299, 3)
(64, 1000)
batch  7
(64, 299, 299, 3)
(64, 1000)
batch  8
(64, 299, 299, 3)
(64, 1000)
batch  9
(64, 299, 299, 3)
(64, 1000)
done!

Train Script Template

A CNN training script template is provided with the following features:

easy switchin of datasets
separate training and testing streams
continous console log
test-set evaluation after every epoch
automatically saves the best performing model parameters
automatically decreases the learning rate after if there is no improvement in accuracy
evaluate top 1 and top n accuracies
easy parameter loading from a previous save point to continue training
prints a confusion matrix in your console

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
datasets		datasets
libs		libs
nets		nets
utils		utils
.gitignore		.gitignore
README.md		README.md
cifar-100_example.py		cifar-100_example.py
example_train.py		example_train.py
imagenet_example.py		imagenet_example.py
pip3_requirements.txt		pip3_requirements.txt
slim_eval.py		slim_eval.py
slim_highway_eval.py		slim_highway_eval.py
slim_highway_train.py		slim_highway_train.py
slim_resnet_eval.py		slim_resnet_eval.py
slim_resnet_train.py		slim_resnet_train.py
slim_train.py		slim_train.py
test.py		test.py

lach76/tensorflow-input-pipelines

Folders and files

Latest commit

History

Repository files navigation

TensorFlow Input Pipelines

Supported Datasets

Example

Installation and Running the Cifar-100 Example

Download the Imagenet Data

Train Script Template

About

Resources

Stars

Watchers

Forks

Languages