SpikeFuel

SpikeFuel is a toolkit for converting popular visual benchmarks to spiking neuromorphic datasets.

The design principle of this package is to eliminate human intervention during the experiment as much as possible. In such way, human experimenter just needs to setup a proper environment and lets the pipeline run.

The general objectives are:

Precise control of record logging with Python. 🏁
User interface for showing video or images in routine. 🏁
Experiment configuration system (with JSON style). 🏁
Post signal analysis and selection tools.

This piece of code is in rapid development, and the author makes stupid mistakes everywhere (typos, data type messed up, untested codes, etc). Until everything is stable, please use with caution.

Dependency

At this moment, I haven't written a setup script, so you need to manually install the dependencies listed here.

The scientific Python distribution - Anaconda will provide most dependencies. I recommend this distribution if you don't want to mess with system's Python.

System library (You don't have to install yourself):

cPickle
glob
os
socket
struct
sys
time

3rd party packages

numpy (included in Anaconda)
sacred (install by pip install sacred)
subprocess32 (install by pip install subprocess32)

3rd party packages that are installed with Anaconda

ffmpeg (follow the link)
pyav (follow the link)
opencv (follow the link)

List of Solved and Unsolved Problems

Notes

On configuring jAER

This section is mainly concerned with jAER on Mac OS X currently.

If you haven't installed jAER before, you can check out here. jAER is the central component for logging, viewing and managing DVS recordings. You can setup as follows

Create an empty folder and enter it from terminal
```
mkdir jaer
cd jaer
```

Check out using svn

svn co https://svn.code.sf.net/p/jaer/code/jAER/trunk/

Technically, you are all set. However, the Mac support for jAER has some viewing problem. And it's getting serious in recent revisions. Hence, you should roll back for few revisions

svn up -r8329

The latest revision of jAER is r8272 (2016-03-08). However, since r8329, the viewing of DVS event went wrong. I will keep checking if they fix this issue and update here.

Follow instructions below for viewing DVS event:

Open jAER from terminal by running (make sure you are at folder trunk)
```
bash jAERViewer1.5_linux.sh
```
Once the viewer started, hookup your DVS with your Mac
Unhook DVS with the viewer by Interface -- None (Mac Only)
Open another viewer by File -- New Viewer and close previous viewer (Mac Only)
Capture DVS event by View -- Bias /HW Configuration, At User-Friendly Controls tab, tick Capture events and Display events.

If you are running jAER from Linux Platform, you don't have to open second viewer as you did in Mac OS X.

Now you should be able to see events like this:

DVS Events Example

On Basic Technical Specs of DAViS240B

The DVS device used in this project is DAViS240B, compared to its previous generation, this camera offers higher resolution and additional functionalities. For details, you can go through this info page

Noticed at the end of the page, it says:

... the area of the array which is completely homogeneous is 190 x 180 for DVS and 170 x 180 for APS and DVS combined.

In real data collection experiment, we use DAViS240C in order to get better resolution.

On Remote Control of jAER using Terminal

jAER allows to control logging activities from remote terminal via UDP connection.

Once jAER started, it listens port 8997 by default (also number of other ports). You can check if jAER is listening the port by (Mac Only)

lsof -n -i -P | grep UDP

Then you can connect AEViewer by

nc -u localhost 8997

Unlike Windows ncat, you can use nc to establish an UDP connection. However, there is no welcoming message or whatsoever once you established the connection. Rather, you can straight away type help to display the help message.

Below is the help message you should receive.

startlogging <filename> - starts logging ae data to a file
stoplogging - stops logging ae data to a file
togglesynclogging - starts synchronized logging ae data to a set of files with aeidx filename automatically timstamped
zerotimestamps - zeros timestamps on all AEViewers
>

Furthermore, the filename requires absolute path of the recording instead of relative path.

On Remote Control of jAER using Python

As jAER opened, it starts to listen UDP port 8997 as above mentioned, and you can send commands for logging via this port.

Since there may multiple recordings, therefore I separated socket initialization and close functions from function of sending commands so you don't have to initialize sockets repeatedly.
The first viewer of jAER listens port 8997, and the second viewer of jAER listens to a different UDP port, and it's 8998, and this order is followed if you open more viewers. You can then use port 8998 to log recording.
One strange thing is when you close the viewer, the port doesn't stop listening from such port.
I added a function for reset timestamps across the viewers. However, at the beginning of the recording, the time stamp is not 0 or a small number close to 0, probably needs a after cut. And this reset does consume a small number of time to the viewer. (UPDATE: this problem is caused due to the viewers don't have enough time to reset time stamps before received logging command, so I added a very short delay after resetting time stamps, now the recording starts at a small number near 0s. This small delay is set to 0.008s on my Mac, and I think this delay is gonna vary for different machines, considering it's only messaging at local machine, this delay should be fine.)
If the port is not listened due to some reasons, the program is gonna wait until the port sends proper feedback.

On Running Experiments in Terminal

You can run experiments from terminal if you need

For VOT experiment:

PYTHONPATH=./:$PYTHONPATH python ./scripts/dvs_vot_exp.py with ./configs/dvs_vot_exp_config_linux.json

For UCF-50 experiment:

PYTHONPATH=./:$PYTHONPATH python ./scripts/dvs_ucf50_exp.py with ./configs/dvs_ucf50_exp_config_linux.json

For UCF-101 experiment:

PYTHONPATH=./:$PYTHONPATH python ./scripts/dvs_ucf101_exp.py with ./configs/dvs_ucf101_exp_config_linux.json

For Tracking Dataset experiment:

PYTHONPATH=./:$PYTHONPATH python ./scripts/dvs_tracking_exp.py with ./configs/dvs_tracking_exp_config_linux.json

For Caltech-256 Dataset experiment

PYTHONPATH=./:$PYTHONPATH python ./scripts/dvs_caltech256_exp.py with ./configs/dvs_caltech256_exp_config_linux.json

On OpenCV with Python

It seems that I failed to display numpy.ndarray successfully with OpenCV's imread last night (2016-03-08), however, the type of read image is still numpy.ndarray. I need to figure out the correspondence between the viewers. (It is really numpy.ndarray)
Creating large border is time consuming, should create beforehand. Can be done within interval.
For capturing keyboard action, use following code:
```
k = cv2.waitKey(1) & 0xFF
```
Here, ESC key will return 27, Space bar will return 32, letter keys will return the same chars in small case. (Seems not working properly, need to check out)

On Developing GUI

There is no universal way of detecting screen resolution settings in Python (maybe I can do a platform dependent detection later), so you need to configure screen's resolution in config files.
OpenCV allows setting windows property to display full screen image, however, it keeps crashing on my Mac. You can manually click full screen button with the window.

Way to turn on full screen property:

cv2.namedWindow("test", cv2.WND_PROP_FULLSCREEN)

Loading long sequence is pressuring memory.
Draw full screen image is slow. [Fix by using limited resolution]
It somehow looks fine without saccades for the image. Should analyze more to find out if someone can extract frames.
As Tobi suggested, I may use gray background instead of black background.

On Generating Groundtruth Bounding Boxes

In object tracking or objet detection, there are always bounding boxes. These bounding boxes are hand-labelled by human and of course; it's based on the frames or images they saw. However, this created a difficulty when you want to convert such datasets to spiking neuromorphic dataset using DVS. First, DVS has a fixed low-resolution camera. Second, if an object in the sequence or image is somewhat difficult to detect, you can't even hand-label them afterwards. So far I've thought two ways of dealing with this issue.

The general scenario is the frame or image wouldn't be larger than the monitor's resolution. And besides the image, the background is filled with a single color static background. Assume image height is img_h, width is img_w, height of window is height, and width is width.

Each bounding box is defined by 4 points (x1, y1), (x2, y2), (x3, y3), (x4, y4). So we can easily calculate the relative position of a point at the image, and since we also know the resolution of the screen, hence, we can also calculate the relative position of the point at the screen. With this approach, you can scale the image flexibly without losing the position of the point. The disadvantage is rather obvious. DVS has to be positioned carefully to cover the full region of the display.

Bag

Original bounded image

Bounded resized image

Leaves

Original bounded image

Bounded resized image
The second way needs more time to collect the data, however, it's more flexible on the experiment condition, as long as the image is fully framed by DVS this solution should work. In the first round, we use DAViS240B to capture DVS event images as natural. In the second round, instead of recording DVS events, we recording the original frame or image with bounding boxes. So as long as we can find the correspondence, the bounding boxes can be found. Of course this will bring more programming challenges.

Both solutions described above have some accurate automatic bounding box labelling with proper experiment setup. I would code the first on at start since it can be easily handled.

On Generating Dataset Stats

It is sometimes easier to query data from datasets with some basic info.

VOT Challenge 2015

vot_stats.pkl has 2 attributes 'vot_list', and 'num_frames'.

vot_list: a numpy string array with 60 sequence names in order
num_frames: a hand-coded respective number of frames for each sequences in a ordered list.


Bag	Bolt2

Tracking Dataset

Tracking Dataset is collected from published literature. There are total 77 sequences. You can obtain the dataset from here

tracking_stats.pkl contains statistics of Tracking Dataset and help you read the dataset.

The attributes are:

primary_list: a list consists of folder names of primary categories
secondary_list: a dictionary contains all folder names of secondary categories
- [primary category name]: a list consists of folder names of a particular primary category. e.g. secondary_list['Babenko'] will return ['girl', 'OccludedFace2', 'surfer']
[secondary category name]: a list contains all file names in a particular secondary category

WARNING: In primary category BoBot, the number of frames of each secondary category has 1 additional frames than proposed. You can drop the last frame. Due to this fact, I dropped the last frame during the generation of statistics


Person Part Occluded	Cliff Dive 2

UCF-50

ucf50_stats.pkl contains statistics of UCF-50 Action Recognition Dataset. The attributes are:

ucf50_list: consists of 50 class names.
[class_name]: there are 50 lists that contains video names of the dataset. Each list is named as its class name. E.g. BaseballPitch, Basketball, etc.


BaseballPitch Group 3 Clip 4	HorseRace Group 3 Clip 5

UCF-101

ucf101_stats.pkl contains statistics of UCF-101 Action Recognition Dataset. The attributes are:

ucf101_list: consists of 101 class names.
[class_name]: there are 101 lists that contains video names of the dataset. Each list is named as its class name. E.g. ApplyEyeMakeup, BlowDryHair, etc.


ApplyEyeMakeup Group 4 Clip 3	CricketBowling Group 1 Clip 2

Caltech-256

caltech256_stats.pkl contains statistics of Caltech-256 Recognition dataset.

The attributes are:

caltech256_list: consists of 257 class names.
[class_name]: there are 257 lists that contains image names of the dataset. Each list is named as its class name. E.g. 001.ak47, 005.baseball-glove, etc.


Fireworks No. 19

Mountain bike No. 28

Theodolite No. 33

STL-10

Images are saved in binary files. The statistics can be acquired immediately from the dataset.

On Generating DVS Image

Few facts:

For each aedat logging file, you can parse and save as 4 variables: timestamps, X position of the event, Y position of the event and then polarity data
For polarity, 1 is ON event, means becoming brighter; 0 is OFF event, means becoming darker.
The timestamps are labelled in unit of microsecond.

The main challenge is to do pruning and enhancements to recordings.

By using VOT statistics, we should be able to figure out the correspondence between VOT frames and DVS frames. [ON THE WAY]
I added 2 extra steps to smooth the recording process. The first is displaying static background that fills up entire window. This is to remove the effect the first frame of next video tries to calculate difference between previous last frame. The second is playing the first frame for few seconds before displaying the sequence, this is to remove the big difference brought by the static background. These changes made the sampling more efficiently.
Playing step by step is somehow different from playing the recordings in jAER. [It's different]
The automatic labeling process considered the fact that all frames are resized to ratio 4:3 for recording purposes.
By [name needed]'s suggestion, I now aggregate the frames by using total number of frames, this is a rather simple idea and it works well. But I still need some careful tuning on this method so new frames can match with original frames perfectly.
The automatic labeling so far is working reasonable by using my relative position based calculation. However, for the sequences that doesn't match so well, the bounding box will wrongly label few frames.
⭐[UPDATE: 2016-03-23] Frame selection is generally working, I still need to test for all recordings. Bounding box labeling is generally working, and I think the major problem is that DAViS240C is not well positioned as much as possible.
⭐[UPDATE: 2016-03-24] Frame selection still has some small mistakes. There seems some bad events that give wrong event locations. Bounding box generation is generally good for Tracking Dataset.
⭐[UPDATE: 2016-03-26] I tried to record the video in lower frequency (30Hz), and it does improve my labeling.
⭐[UPDATE: 2016-04-01] Fixed a major bug during frame generation, now the bounding box is nearly perfect.
The above problem may not apply to image dataset.

Below are few examples (Old, new refined examples are coming)


VOT Challenge 2015
bag (mismatched)	bolt1 (matches well)

gymnastics3 (matches well)	singer1 (no detail info at last)


Tracking Dataset
girl	person part occluded

On Creating HDF5 Dataset

All recordings of a certain dataset is saved in one single HDF5 dataset file. All datasets are shipped in HDF5 format for fast access and uniform interface for different computing platforms.

The design principles for a given set of recordings are:

Each recording is a subgroup in HDF5, and there are at least 4 datasets in this subgroup
- timestamps: saved in int32 in principle
- x_pos: (0, 240), saved in uint8 in principle
- y_pos: (0, 180), saved in uint8 in principle
- pol: (0/1), saved in boolean in principle
- bounding_boxes: (optional) (num_frames x 8), saved in either float32 (default) or uint8
- bounding_boxes_timestamp: (optional) (structure TBD), saved in int32 in principle
For each recordings, there are multiple meta attributes associated with each subgroup
- display_freq: display frequency while sampling
- num_frames: (optional, for video data only), number of frames in original video
For recognition dataset, each recording is not directly associated with root group, instead, like saving in folder, a sub-group titled category's name will be created.
Printing dataset structure before calling data is strongly recommended

While designing the dataset, I tend to preserve as much data as original. And clear description of the dataset will be published.

If you are new to HDF5, please read documentation of h5py package. You can find it here. The quick start guide would provide sufficient background knowledge to the package.

Tracking Dataset in HDF5

TrackingDataset is stored in HDF5 format. All sequences beside category "Kalal" are encoded.

The overall structure of TrackingDataset:

root
|
|--- Babenko
  |
  |--- girl
    |--- timestamps
    |--- x_pos
    |--- y_pos
    |--- pol
    |--- bounding_box
  |--- OccludedFace2
    |--- ...
  |--- surfer
    |--- ...
|--- BoBot
  |--- ...
|
|--- Cehovin
  |--- ...
|
...
|--- Wang

This structure follows exactly as original frame-based TrackingDataset

The metadata associated with root group are

Attributes	Value	Description
`device`	DAViS240C	DVS device model used
`fps`	30	Internal refreshing rate
`monitor_id`	`SAMSUNG SyncMaster 2343BW`	Monitor model number
`monitor_feq`	60	Monitor display rate

The above attributes described experiment condition and equipment information.

There are total 12 primary group under root group:

Babenko, BoBot, Cehovin, Ellis_ijcv2011, Godec, Kwon, Kwon_VTD,
Other, PROST, Ross, Thang, Wang

For each primary group, there are several recordings, each recording is a subgroup of the corresponding primary group. Each recording group has one metadata attribute - num_frames, which records number of frames in original frame-based dataset. For each recording group, there are 5 datasets

Dataset	Data type	Description
`timestamps`	`np.int32`	Timestamps of the recording
`x_pos`	`np.uint8`	X position of the recording
`y_pos`	`np.uint8`	Y position of the recording
`pol`	`np.bool`	Polarity information of the recording
`bounding_box`	`np.float32`	Bounding box information of the recording

IMPORTANT: The first column of the bounding_box stores some timestamps. Each timestamp represents that the object appears at a location at the particular frame for the first time. These timestamps are generated based on the frame generation function available at dvsproc.py

Furthermore, bounding boxes for few sequences are not so well computed due to the fuzziness of the recording. Please use with caution.

UCF-50 in HDF5

UCF-50 is now stored in HDF5 format. The structure of the dataset follows exactly like original dataset.

Under root group, there are 50 sub-groups which represent 50 different categories in original dataset. Each category group also consists of number of sub-groups that contains recording's data. For each recording group, there are 4 datasets:

Dataset	Data type	Description
`timestamps`	`np.int32`	Timestamps of the recording
`x_pos`	`np.uint8`	X position of the recording
`y_pos`	`np.uint8`	Y position of the recording
`pol`	`np.bool`	Polarity information of the recording

The metadata of root group and each recording group are same as TrackingDataset for now

IMPORTANT: THERE ARE MULTIPLE DAMAGED RECORDINGS IDENTIFIED. THESE RECORDINGS WILL BE RE-RECORDED AND REPLACED IN THE DATASET IN FUTURE.

VOT Challenge Dataset in HDF5

VOT Challenge Dataset is stored in HDF5 format now. The structure of the dataset follows exactly like original dataset.

Under root group, there are 60 recording groups. Each group contains recording data of corresponding sequence.

Dataset	Data type	Description
`timestamps`	`np.int32`	Timestamps of the recording
`x_pos`	`np.uint8`	X position of the recording
`y_pos`	`np.uint8`	Y position of the recording
`pol`	`np.bool`	Polarity information of the recording
`bounding_box`	`np.float32`	Bounding box information of the recording

The metadata of root group and each recording group are same as TrackingDataset for now

IMPORTANT: THERE ARE MULTIPLE DAMAGED RECORDINGS IDENTIFIED. THESE RECORDINGS WILL BE RE-RECORDED AND REPLACED IN THE DATASET IN FUTURE.

Caltech-256 in HDF5

TBD

Contacts

Yuhuang Hu
Email: duguyue100@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 202 Commits
configs		configs
data		data
scripts		scripts
spikefuel		spikefuel
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md


Bag
Original bounded image

Bounded resized image

Leaves
Original bounded image

Bounded resized image

License

jamesbondo/spikefuel

Folders and files

Latest commit

History

Repository files navigation