Skip to content

Toolkit for converting visual recognition benchmarks to spiking neuromorphic datasets.

License

Notifications You must be signed in to change notification settings

jamesbondo/spikefuel

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SpikeFuel

SpikeFuel is a toolkit for converting popular visual benchmarks to spiking neuromorphic datasets.

The design principle of this package is to eliminate human intervention during the experiment as much as possible. In such way, human experimenter just needs to setup a proper environment and lets the pipeline run.

The general objectives are:

  • Precise control of record logging with Python. 🏁
  • User interface for showing video or images in routine. 🏁
  • Experiment configuration system (with JSON style). 🏁
  • Post signal analysis and selection tools.

This piece of code is in rapid development, and the author makes stupid mistakes everywhere (typos, data type messed up, untested codes, etc). Until everything is stable, please use with caution.

Dependency

At this moment, I haven't written a setup script, so you need to manually install the dependencies listed here.

The scientific Python distribution - Anaconda will provide most dependencies. I recommend this distribution if you don't want to mess with system's Python.

System library (You don't have to install yourself):

  • cPickle
  • glob
  • os
  • socket
  • struct
  • sys
  • time

3rd party packages

  • numpy (included in Anaconda)
  • sacred (install by pip install sacred)
  • subprocess32 (install by pip install subprocess32)

3rd party packages that are installed with Anaconda

  • ffmpeg (follow the link)
  • pyav (follow the link)
  • opencv (follow the link)

List of Solved and Unsolved Problems

  • Basic drawing function for image and video datasets
  • Basic I/O utilities
  • GUI with drawing bounding boxes and static background
  • Precise remote control of jAER recording via Python
  • Test remote control with DVS and jAER.
  • Find a way of avoiding jAER rendering problem at Mac
  • Keyboard and mouse support in GUI
  • Use JSON as experiment configuration
  • Bounding box generation based on relative position
  • Test trail for experiment flow
  • Find a higher performance Linux desktop/workstation for dealing with memory limits and processing limits of my Mac (no need anymore)
  • Add calibrations module
  • Setup an GUI experiment flow with size of 720 x 540
  • Start jAER from Python then integrate to experiment flow (can be done, but no smart way of doing it check function start_jaer in helpers.py)
  • Find a proper monitor with proper refreshing scheme and test it
  • Generate DVS imaging from DVS recording
  • Automatic bounding box labelling within DVS images
  • Frame selection in DVS event sequences
  • Add experiment flow for UCF-50
  • VOT challenge recording with DAViS240C
  • Set up the Ubuntu machine for recordings
  • Update code for Linux platform
  • OpenCV window hangs when a large sequence is processing (still hanging, but in a better shape)
  • Plot frames with different levels of events on a pixel
  • Add support for image dataset
  • Refine frame selection based on statics of the recordings
  • Calibrate bias so that hot pixel is avoided
  • Refine bounding box generation for DVS recordings
  • Write output function to save frames and bounding boxes
  • A HDF5 format based saving interface
  • Experiment flow with UCF-101
  • Experiment flow with Caltech-256
  • Random saccades for image and video frames
  • Figure out how to read AVI from OpenCV on Ubuntu (dead end, use PyAV)
  • Change my Ubuntu Java to Oracle Java installation
  • Find 2 actuators for bring saccades to DAViS240C
  • Try to do contrast normalization for frames if needed
  • Try a possible GUI shift from OpenCV to PyQTGraph or VisPy (Currently the workflow is fine, but will hang a moment while the program is computing)
  • Try fix I/O issues once for all with python os package
  • Identify bad recordings and generate list for the recording, and re-record them (UCF-50 and Caltech-256). [Binary data start from #]
  • Clean code and remove redundant/useless parts

Notes

On configuring jAER

This section is mainly concerned with jAER on Mac OS X currently.

If you haven't installed jAER before, you can check out here. jAER is the central component for logging, viewing and managing DVS recordings. You can setup as follows

  • Create an empty folder and enter it from terminal

    mkdir jaer
    cd jaer
    
  • Check out using svn

    svn co https://svn.code.sf.net/p/jaer/code/jAER/trunk/
    

Technically, you are all set. However, the Mac support for jAER has some viewing problem. And it's getting serious in recent revisions. Hence, you should roll back for few revisions

svn up -r8329

The latest revision of jAER is r8272 (2016-03-08). However, since r8329, the viewing of DVS event went wrong. I will keep checking if they fix this issue and update here.

Follow instructions below for viewing DVS event:

  1. Open jAER from terminal by running (make sure you are at folder trunk)

    bash jAERViewer1.5_linux.sh
    
  2. Once the viewer started, hookup your DVS with your Mac

  3. Unhook DVS with the viewer by Interface -- None (Mac Only)

  4. Open another viewer by File -- New Viewer and close previous viewer (Mac Only)

  5. Capture DVS event by View -- Bias /HW Configuration, At User-Friendly Controls tab, tick Capture events and Display events.

If you are running jAER from Linux Platform, you don't have to open second viewer as you did in Mac OS X.

Now you should be able to see events like this:

DVS Events Example
DVS events example

On Basic Technical Specs of DAViS240B

The DVS device used in this project is DAViS240B, compared to its previous generation, this camera offers higher resolution and additional functionalities. For details, you can go through this info page

Noticed at the end of the page, it says:

... the area of the array which is completely homogeneous is 190 x 180 for DVS and 170 x 180 for APS and DVS combined.

In real data collection experiment, we use DAViS240C in order to get better resolution.

On Remote Control of jAER using Terminal

jAER allows to control logging activities from remote terminal via UDP connection.

Once jAER started, it listens port 8997 by default (also number of other ports). You can check if jAER is listening the port by (Mac Only)

lsof -n -i -P | grep UDP

Then you can connect AEViewer by

nc -u localhost 8997

Unlike Windows ncat, you can use nc to establish an UDP connection. However, there is no welcoming message or whatsoever once you established the connection. Rather, you can straight away type help to display the help message.

Below is the help message you should receive.

startlogging <filename> - starts logging ae data to a file
stoplogging - stops logging ae data to a file
togglesynclogging - starts synchronized logging ae data to a set of files with aeidx filename automatically timstamped
zerotimestamps - zeros timestamps on all AEViewers
>

Furthermore, the filename requires absolute path of the recording instead of relative path.

On Remote Control of jAER using Python

As jAER opened, it starts to listen UDP port 8997 as above mentioned, and you can send commands for logging via this port.

  • Since there may multiple recordings, therefore I separated socket initialization and close functions from function of sending commands so you don't have to initialize sockets repeatedly.

  • The first viewer of jAER listens port 8997, and the second viewer of jAER listens to a different UDP port, and it's 8998, and this order is followed if you open more viewers. You can then use port 8998 to log recording.

  • One strange thing is when you close the viewer, the port doesn't stop listening from such port.

  • I added a function for reset timestamps across the viewers. However, at the beginning of the recording, the time stamp is not 0 or a small number close to 0, probably needs a after cut. And this reset does consume a small number of time to the viewer. (UPDATE: this problem is caused due to the viewers don't have enough time to reset time stamps before received logging command, so I added a very short delay after resetting time stamps, now the recording starts at a small number near 0s. This small delay is set to 0.008s on my Mac, and I think this delay is gonna vary for different machines, considering it's only messaging at local machine, this delay should be fine.)

  • If the port is not listened due to some reasons, the program is gonna wait until the port sends proper feedback.

On Running Experiments in Terminal

You can run experiments from terminal if you need

  • For VOT experiment:

    PYTHONPATH=./:$PYTHONPATH python ./scripts/dvs_vot_exp.py with ./configs/dvs_vot_exp_config_linux.json
    
  • For UCF-50 experiment:

    PYTHONPATH=./:$PYTHONPATH python ./scripts/dvs_ucf50_exp.py with ./configs/dvs_ucf50_exp_config_linux.json
    
  • For UCF-101 experiment:

    PYTHONPATH=./:$PYTHONPATH python ./scripts/dvs_ucf101_exp.py with ./configs/dvs_ucf101_exp_config_linux.json
    
  • For Tracking Dataset experiment:

    PYTHONPATH=./:$PYTHONPATH python ./scripts/dvs_tracking_exp.py with ./configs/dvs_tracking_exp_config_linux.json
    
  • For Caltech-256 Dataset experiment

PYTHONPATH=./:$PYTHONPATH python ./scripts/dvs_caltech256_exp.py with ./configs/dvs_caltech256_exp_config_linux.json

On OpenCV with Python

  • It seems that I failed to display numpy.ndarray successfully with OpenCV's imread last night (2016-03-08), however, the type of read image is still numpy.ndarray. I need to figure out the correspondence between the viewers. (It is really numpy.ndarray)

  • Creating large border is time consuming, should create beforehand. Can be done within interval.

  • For capturing keyboard action, use following code:

    k = cv2.waitKey(1) & 0xFF

    Here, ESC key will return 27, Space bar will return 32, letter keys will return the same chars in small case. (Seems not working properly, need to check out)

On Developing GUI

  • There is no universal way of detecting screen resolution settings in Python (maybe I can do a platform dependent detection later), so you need to configure screen's resolution in config files.

  • OpenCV allows setting windows property to display full screen image, however, it keeps crashing on my Mac. You can manually click full screen button with the window.

  • Way to turn on full screen property:

    cv2.namedWindow("test", cv2.WND_PROP_FULLSCREEN)
  • Loading long sequence is pressuring memory.

  • Draw full screen image is slow. [Fix by using limited resolution]

  • It somehow looks fine without saccades for the image. Should analyze more to find out if someone can extract frames.

  • As Tobi suggested, I may use gray background instead of black background.

On Generating Groundtruth Bounding Boxes

In object tracking or objet detection, there are always bounding boxes. These bounding boxes are hand-labelled by human and of course; it's based on the frames or images they saw. However, this created a difficulty when you want to convert such datasets to spiking neuromorphic dataset using DVS. First, DVS has a fixed low-resolution camera. Second, if an object in the sequence or image is somewhat difficult to detect, you can't even hand-label them afterwards. So far I've thought two ways of dealing with this issue.

The general scenario is the frame or image wouldn't be larger than the monitor's resolution. And besides the image, the background is filled with a single color static background. Assume image height is img_h, width is img_w, height of window is height, and width is width.

  1. Each bounding box is defined by 4 points (x1, y1), (x2, y2), (x3, y3), (x4, y4). So we can easily calculate the relative position of a point at the image, and since we also know the resolution of the screen, hence, we can also calculate the relative position of the point at the screen. With this approach, you can scale the image flexibly without losing the position of the point. The disadvantage is rather obvious. DVS has to be positioned carefully to cover the full region of the display.

    Bag
    Original bounded image
    bag-bounded original
    Bounded resized image
    bag-bounded resized
    Leaves
    Original bounded image
    leaves-bounded original
    Bounded resized image
    leaves-bounded resized
  2. The second way needs more time to collect the data, however, it's more flexible on the experiment condition, as long as the image is fully framed by DVS this solution should work. In the first round, we use DAViS240B to capture DVS event images as natural. In the second round, instead of recording DVS events, we recording the original frame or image with bounding boxes. So as long as we can find the correspondence, the bounding boxes can be found. Of course this will bring more programming challenges.

Both solutions described above have some accurate automatic bounding box labelling with proper experiment setup. I would code the first on at start since it can be easily handled.

On Generating Dataset Stats

It is sometimes easier to query data from datasets with some basic info.

VOT Challenge 2015

vot_stats.pkl has 2 attributes 'vot_list', and 'num_frames'.

  • vot_list: a numpy string array with 60 sequence names in order
  • num_frames: a hand-coded respective number of frames for each sequences in a ordered list.
Bag Bolt2
bag bounded example bolt2 bounded example

Tracking Dataset

Tracking Dataset is collected from published literature. There are total 77 sequences. You can obtain the dataset from here

tracking_stats.pkl contains statistics of Tracking Dataset and help you read the dataset.

The attributes are:

  • primary_list: a list consists of folder names of primary categories
  • secondary_list: a dictionary contains all folder names of secondary categories
    • [primary category name]: a list consists of folder names of a particular primary category. e.g. secondary_list['Babenko'] will return ['girl', 'OccludedFace2', 'surfer']
  • [secondary category name]: a list contains all file names in a particular secondary category

WARNING: In primary category BoBot, the number of frames of each secondary category has 1 additional frames than proposed. You can drop the last frame. Due to this fact, I dropped the last frame during the generation of statistics

Person Part Occluded Cliff Dive 2
person part occluded cliff dive 2

UCF-50

ucf50_stats.pkl contains statistics of UCF-50 Action Recognition Dataset. The attributes are:

  • ucf50_list: consists of 50 class names.
  • [class_name]: there are 50 lists that contains video names of the dataset. Each list is named as its class name. E.g. BaseballPitch, Basketball, etc.
BaseballPitch Group 3 Clip 4 HorseRace Group 3 Clip 5
BaseBallPitch 3 4 HorseRace 3 5

UCF-101

ucf101_stats.pkl contains statistics of UCF-101 Action Recognition Dataset. The attributes are:

  • ucf101_list: consists of 101 class names.
  • [class_name]: there are 101 lists that contains video names of the dataset. Each list is named as its class name. E.g. ApplyEyeMakeup, BlowDryHair, etc.
ApplyEyeMakeup Group 4 Clip 3 CricketBowling Group 1 Clip 2
ApplyEyeMakeup 4 3 CricketBowling 1 2

Caltech-256

caltech256_stats.pkl contains statistics of Caltech-256 Recognition dataset.

The attributes are:

  • caltech256_list: consists of 257 class names.
  • [class_name]: there are 257 lists that contains image names of the dataset. Each list is named as its class name. E.g. 001.ak47, 005.baseball-glove, etc.
Fireworks No. 19
fireworks 19
Mountain bike No. 28
mountain bike 28
Theodolite No. 33
theodolite 33

STL-10

Images are saved in binary files. The statistics can be acquired immediately from the dataset.

On Generating DVS Image

Few facts:

  • For each aedat logging file, you can parse and save as 4 variables: timestamps, X position of the event, Y position of the event and then polarity data

  • For polarity, 1 is ON event, means becoming brighter; 0 is OFF event, means becoming darker.

  • The timestamps are labelled in unit of microsecond.

The main challenge is to do pruning and enhancements to recordings.

  • By using VOT statistics, we should be able to figure out the correspondence between VOT frames and DVS frames. [ON THE WAY]

  • I added 2 extra steps to smooth the recording process. The first is displaying static background that fills up entire window. This is to remove the effect the first frame of next video tries to calculate difference between previous last frame. The second is playing the first frame for few seconds before displaying the sequence, this is to remove the big difference brought by the static background. These changes made the sampling more efficiently.

  • Playing step by step is somehow different from playing the recordings in jAER. [It's different]

  • The automatic labeling process considered the fact that all frames are resized to ratio 4:3 for recording purposes.

  • By [name needed]'s suggestion, I now aggregate the frames by using total number of frames, this is a rather simple idea and it works well. But I still need some careful tuning on this method so new frames can match with original frames perfectly.

  • The automatic labeling so far is working reasonable by using my relative position based calculation. However, for the sequences that doesn't match so well, the bounding box will wrongly label few frames.

  • ⭐[UPDATE: 2016-03-23] Frame selection is generally working, I still need to test for all recordings. Bounding box labeling is generally working, and I think the major problem is that DAViS240C is not well positioned as much as possible.

  • ⭐[UPDATE: 2016-03-24] Frame selection still has some small mistakes. There seems some bad events that give wrong event locations. Bounding box generation is generally good for Tracking Dataset.

  • ⭐[UPDATE: 2016-03-26] I tried to record the video in lower frequency (30Hz), and it does improve my labeling.

  • ⭐[UPDATE: 2016-04-01] Fixed a major bug during frame generation, now the bounding box is nearly perfect.

  • The above problem may not apply to image dataset.

Below are few examples (Old, new refined examples are coming)

VOT Challenge 2015
bag (mismatched) bolt1 (matches well)
bag dvs bounded example bolt1 dvs bounded example
gymnastics3 (matches well) singer1 (no detail info at last)
gymnastics3 dvs bounded example singer1 dvs bounded example
Tracking Dataset
girl person part occluded
girl dvs bounded example person part occluded dvs bounded example

On Creating HDF5 Dataset

All recordings of a certain dataset is saved in one single HDF5 dataset file. All datasets are shipped in HDF5 format for fast access and uniform interface for different computing platforms.

The design principles for a given set of recordings are:

  • Each recording is a subgroup in HDF5, and there are at least 4 datasets in this subgroup
    • timestamps: saved in int32 in principle
    • x_pos: (0, 240), saved in uint8 in principle
    • y_pos: (0, 180), saved in uint8 in principle
    • pol: (0/1), saved in boolean in principle
    • bounding_boxes: (optional) (num_frames x 8), saved in either float32 (default) or uint8
    • bounding_boxes_timestamp: (optional) (structure TBD), saved in int32 in principle
  • For each recordings, there are multiple meta attributes associated with each subgroup
    • display_freq: display frequency while sampling
    • num_frames: (optional, for video data only), number of frames in original video
  • For recognition dataset, each recording is not directly associated with root group, instead, like saving in folder, a sub-group titled category's name will be created.
  • Printing dataset structure before calling data is strongly recommended

While designing the dataset, I tend to preserve as much data as original. And clear description of the dataset will be published.

If you are new to HDF5, please read documentation of h5py package. You can find it here. The quick start guide would provide sufficient background knowledge to the package.

Tracking Dataset in HDF5

TrackingDataset is stored in HDF5 format. All sequences beside category "Kalal" are encoded.

The overall structure of TrackingDataset:

root
|
|--- Babenko
  |
  |--- girl
    |--- timestamps
    |--- x_pos
    |--- y_pos
    |--- pol
    |--- bounding_box
  |--- OccludedFace2
    |--- ...
  |--- surfer
    |--- ...
|--- BoBot
  |--- ...
|
|--- Cehovin
  |--- ...
|
...
|--- Wang

This structure follows exactly as original frame-based TrackingDataset

The metadata associated with root group are

Attributes Value Description
device DAViS240C DVS device model used
fps 30 Internal refreshing rate
monitor_id SAMSUNG SyncMaster 2343BW Monitor model number
monitor_feq 60 Monitor display rate

The above attributes described experiment condition and equipment information.

There are total 12 primary group under root group:

Babenko, BoBot, Cehovin, Ellis_ijcv2011, Godec, Kwon, Kwon_VTD,
Other, PROST, Ross, Thang, Wang

For each primary group, there are several recordings, each recording is a subgroup of the corresponding primary group. Each recording group has one metadata attribute - num_frames, which records number of frames in original frame-based dataset. For each recording group, there are 5 datasets

Dataset Data type Description
timestamps np.int32 Timestamps of the recording
x_pos np.uint8 X position of the recording
y_pos np.uint8 Y position of the recording
pol np.bool Polarity information of the recording
bounding_box np.float32 Bounding box information of the recording

IMPORTANT: The first column of the bounding_box stores some timestamps. Each timestamp represents that the object appears at a location at the particular frame for the first time. These timestamps are generated based on the frame generation function available at dvsproc.py

Furthermore, bounding boxes for few sequences are not so well computed due to the fuzziness of the recording. Please use with caution.

UCF-50 in HDF5

UCF-50 is now stored in HDF5 format. The structure of the dataset follows exactly like original dataset.

Under root group, there are 50 sub-groups which represent 50 different categories in original dataset. Each category group also consists of number of sub-groups that contains recording's data. For each recording group, there are 4 datasets:

Dataset Data type Description
timestamps np.int32 Timestamps of the recording
x_pos np.uint8 X position of the recording
y_pos np.uint8 Y position of the recording
pol np.bool Polarity information of the recording

The metadata of root group and each recording group are same as TrackingDataset for now

IMPORTANT: THERE ARE MULTIPLE DAMAGED RECORDINGS IDENTIFIED. THESE RECORDINGS WILL BE RE-RECORDED AND REPLACED IN THE DATASET IN FUTURE.

VOT Challenge Dataset in HDF5

VOT Challenge Dataset is stored in HDF5 format now. The structure of the dataset follows exactly like original dataset.

Under root group, there are 60 recording groups. Each group contains recording data of corresponding sequence.

Dataset Data type Description
timestamps np.int32 Timestamps of the recording
x_pos np.uint8 X position of the recording
y_pos np.uint8 Y position of the recording
pol np.bool Polarity information of the recording
bounding_box np.float32 Bounding box information of the recording

The metadata of root group and each recording group are same as TrackingDataset for now

IMPORTANT: THERE ARE MULTIPLE DAMAGED RECORDINGS IDENTIFIED. THESE RECORDINGS WILL BE RE-RECORDED AND REPLACED IN THE DATASET IN FUTURE.

Caltech-256 in HDF5

TBD

Contacts

Yuhuang Hu
Email: duguyue100@gmail.com

About

Toolkit for converting visual recognition benchmarks to spiking neuromorphic datasets.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%