SpikeFuel is a toolkit for converting popular visual benchmarks to spiking neuromorphic datasets.
The design principle of this package is to eliminate human intervention during the experiment as much as possible. In such way, human experimenter just needs to setup a proper environment and lets the pipeline run.
The general objectives are:
- Precise control of record logging with Python. 🏁
- User interface for showing video or images in routine. 🏁
- Experiment configuration system (with JSON style). 🏁
- Post signal analysis and selection tools.
This piece of code is in rapid development, and the author makes stupid mistakes everywhere (typos, data type messed up, untested codes, etc). Until everything is stable, please use with caution.
At this moment, I haven't written a setup script, so you need to manually install the dependencies listed here.
The scientific Python distribution - Anaconda will provide most dependencies. I recommend this distribution if you don't want to mess with system's Python.
System library (You don't have to install yourself):
cPickle
glob
os
socket
struct
sys
time
3rd party packages
numpy
(included in Anaconda)sacred
(install bypip install sacred
)subprocess32
(install bypip install subprocess32
)
3rd party packages that are installed with Anaconda
- Basic drawing function for image and video datasets
- Basic I/O utilities
- GUI with drawing bounding boxes and static background
- Precise remote control of jAER recording via Python
- Test remote control with DVS and jAER.
- Find a way of avoiding jAER rendering problem at Mac
- Keyboard and mouse support in GUI
- Use JSON as experiment configuration
- Bounding box generation based on relative position
- Test trail for experiment flow
- Find a higher performance Linux desktop/workstation for dealing with memory limits and processing limits of my Mac (no need anymore)
- Add calibrations module
- Setup an GUI experiment flow with size of 720 x 540
- Start jAER from Python then integrate to experiment flow (can be done,
but no smart way of doing it check function
start_jaer
inhelpers.py
) - Find a proper monitor with proper refreshing scheme and test it
- Generate DVS imaging from DVS recording
- Automatic bounding box labelling within DVS images
- Frame selection in DVS event sequences
- Add experiment flow for UCF-50
- VOT challenge recording with DAViS240C
- Set up the Ubuntu machine for recordings
- Update code for Linux platform
- OpenCV window hangs when a large sequence is processing (still hanging, but in a better shape)
- Plot frames with different levels of events on a pixel
- Add support for image dataset
- Refine frame selection based on statics of the recordings
- Calibrate bias so that hot pixel is avoided
- Refine bounding box generation for DVS recordings
- Write output function to save frames and bounding boxes
- A HDF5 format based saving interface
- Experiment flow with UCF-101
- Experiment flow with Caltech-256
- Random saccades for image and video frames
- Figure out how to read AVI from OpenCV on Ubuntu (dead end, use
PyAV
) - Change my Ubuntu Java to Oracle Java installation
- Find 2 actuators for bring saccades to DAViS240C
- Try to do contrast normalization for frames if needed
- Try a possible GUI shift from OpenCV to PyQTGraph or VisPy (Currently the workflow is fine, but will hang a moment while the program is computing)
- Try fix I/O issues once for all with python
os
package - Identify bad recordings and generate list for the recording, and re-record them (UCF-50 and Caltech-256). [Binary data start from #]
- Clean code and remove redundant/useless parts
This section is mainly concerned with jAER on Mac OS X currently.
If you haven't installed jAER before, you can check out here. jAER is the central component for logging, viewing and managing DVS recordings. You can setup as follows
-
Create an empty folder and enter it from terminal
mkdir jaer cd jaer
-
Check out using
svn
svn co https://svn.code.sf.net/p/jaer/code/jAER/trunk/
Technically, you are all set. However, the Mac support for jAER has some viewing problem. And it's getting serious in recent revisions. Hence, you should roll back for few revisions
svn up -r8329
The latest revision of jAER is r8272
(2016-03-08). However, since
r8329
, the viewing of DVS event went wrong. I will keep checking if they fix
this issue and update here.
Follow instructions below for viewing DVS event:
-
Open jAER from terminal by running (make sure you are at folder
trunk
)bash jAERViewer1.5_linux.sh
-
Once the viewer started, hookup your DVS with your Mac
-
Unhook DVS with the viewer by
Interface -- None
(Mac Only) -
Open another viewer by
File -- New Viewer
and close previous viewer (Mac Only) -
Capture DVS event by
View -- Bias /HW Configuration
, AtUser-Friendly Controls
tab, tickCapture events
andDisplay events
.
If you are running jAER from Linux Platform, you don't have to open second viewer as you did in Mac OS X.
Now you should be able to see events like this:
DVS Events Example |
---|
The DVS device used in this project is DAViS240B, compared to its previous generation, this camera offers higher resolution and additional functionalities. For details, you can go through this info page
Noticed at the end of the page, it says:
... the area of the array which is completely homogeneous is 190 x 180 for DVS and 170 x 180 for APS and DVS combined.
In real data collection experiment, we use DAViS240C in order to get better resolution.
jAER allows to control logging activities from remote terminal via UDP connection.
Once jAER started, it listens port 8997 by default (also number of other ports). You can check if jAER is listening the port by (Mac Only)
lsof -n -i -P | grep UDP
Then you can connect AEViewer by
nc -u localhost 8997
Unlike Windows ncat
, you can use nc
to establish an UDP connection.
However, there is no welcoming message or whatsoever once you established the
connection. Rather, you can straight away type help
to display the help message.
Below is the help message you should receive.
startlogging <filename> - starts logging ae data to a file
stoplogging - stops logging ae data to a file
togglesynclogging - starts synchronized logging ae data to a set of files with aeidx filename automatically timstamped
zerotimestamps - zeros timestamps on all AEViewers
>
Furthermore, the filename
requires absolute path of the recording instead of
relative path.
As jAER opened, it starts to listen UDP port 8997 as above mentioned, and you can send commands for logging via this port.
-
Since there may multiple recordings, therefore I separated socket initialization and close functions from function of sending commands so you don't have to initialize sockets repeatedly.
-
The first viewer of jAER listens port 8997, and the second viewer of jAER listens to a different UDP port, and it's 8998, and this order is followed if you open more viewers. You can then use port 8998 to log recording.
-
One strange thing is when you close the viewer, the port doesn't stop listening from such port.
-
I added a function for reset timestamps across the viewers. However, at the beginning of the recording, the time stamp is not 0 or a small number close to 0, probably needs a after cut. And this reset does consume a small number of time to the viewer. (UPDATE: this problem is caused due to the viewers don't have enough time to reset time stamps before received logging command, so I added a very short delay after resetting time stamps, now the recording starts at a small number near 0s. This small delay is set to 0.008s on my Mac, and I think this delay is gonna vary for different machines, considering it's only messaging at local machine, this delay should be fine.)
-
If the port is not listened due to some reasons, the program is gonna wait until the port sends proper feedback.
You can run experiments from terminal if you need
-
For VOT experiment:
PYTHONPATH=./:$PYTHONPATH python ./scripts/dvs_vot_exp.py with ./configs/dvs_vot_exp_config_linux.json
-
For UCF-50 experiment:
PYTHONPATH=./:$PYTHONPATH python ./scripts/dvs_ucf50_exp.py with ./configs/dvs_ucf50_exp_config_linux.json
-
For UCF-101 experiment:
PYTHONPATH=./:$PYTHONPATH python ./scripts/dvs_ucf101_exp.py with ./configs/dvs_ucf101_exp_config_linux.json
-
For Tracking Dataset experiment:
PYTHONPATH=./:$PYTHONPATH python ./scripts/dvs_tracking_exp.py with ./configs/dvs_tracking_exp_config_linux.json
-
For Caltech-256 Dataset experiment
PYTHONPATH=./:$PYTHONPATH python ./scripts/dvs_caltech256_exp.py with ./configs/dvs_caltech256_exp_config_linux.json
-
It seems that I failed to display
numpy.ndarray
successfully with OpenCV'simread
last night (2016-03-08), however, the type of read image is stillnumpy.ndarray
. I need to figure out the correspondence between the viewers. (It is really numpy.ndarray) -
Creating large border is time consuming, should create beforehand. Can be done within interval.
-
For capturing keyboard action, use following code:
k = cv2.waitKey(1) & 0xFF
Here, ESC key will return
27
, Space bar will return32
, letter keys will return the same chars in small case. (Seems not working properly, need to check out)
-
There is no universal way of detecting screen resolution settings in Python (maybe I can do a platform dependent detection later), so you need to configure screen's resolution in config files.
-
OpenCV allows setting windows property to display full screen image, however, it keeps crashing on my Mac. You can manually click full screen button with the window.
-
Way to turn on full screen property:
cv2.namedWindow("test", cv2.WND_PROP_FULLSCREEN)
-
Loading long sequence is pressuring memory.
-
Draw full screen image is slow. [Fix by using limited resolution]
-
It somehow looks fine without saccades for the image. Should analyze more to find out if someone can extract frames.
-
As Tobi suggested, I may use gray background instead of black background.
In object tracking or objet detection, there are always bounding boxes. These bounding boxes are hand-labelled by human and of course; it's based on the frames or images they saw. However, this created a difficulty when you want to convert such datasets to spiking neuromorphic dataset using DVS. First, DVS has a fixed low-resolution camera. Second, if an object in the sequence or image is somewhat difficult to detect, you can't even hand-label them afterwards. So far I've thought two ways of dealing with this issue.
The general scenario is the frame or image wouldn't be larger than the monitor's
resolution. And besides the image, the background is filled with a single color
static background. Assume image height is img_h
, width is img_w
, height of
window is height
, and width is width
.
-
Each bounding box is defined by 4 points
(x1, y1), (x2, y2), (x3, y3), (x4, y4)
. So we can easily calculate the relative position of a point at the image, and since we also know the resolution of the screen, hence, we can also calculate the relative position of the point at the screen. With this approach, you can scale the image flexibly without losing the position of the point. The disadvantage is rather obvious. DVS has to be positioned carefully to cover the full region of the display.Bag Original bounded image Bounded resized image Leaves Original bounded image Bounded resized image -
The second way needs more time to collect the data, however, it's more flexible on the experiment condition, as long as the image is fully framed by DVS this solution should work. In the first round, we use DAViS240B to capture DVS event images as natural. In the second round, instead of recording DVS events, we recording the original frame or image with bounding boxes. So as long as we can find the correspondence, the bounding boxes can be found. Of course this will bring more programming challenges.
Both solutions described above have some accurate automatic bounding box labelling with proper experiment setup. I would code the first on at start since it can be easily handled.
It is sometimes easier to query data from datasets with some basic info.
vot_stats.pkl
has 2 attributes 'vot_list'
, and 'num_frames'
.
vot_list
: anumpy
string array with 60 sequence names in ordernum_frames
: a hand-coded respective number of frames for each sequences in a ordered list.
Bag | Bolt2 |
Tracking Dataset is collected from published literature. There are total 77 sequences. You can obtain the dataset from here
tracking_stats.pkl
contains statistics of Tracking Dataset and help you
read the dataset.
The attributes are:
primary_list
: a list consists of folder names of primary categoriessecondary_list
: a dictionary contains all folder names of secondary categories[primary category name]
: a list consists of folder names of a particular primary category. e.g.secondary_list['Babenko']
will return['girl', 'OccludedFace2', 'surfer']
[secondary category name]
: a list contains all file names in a particular secondary category
WARNING: In primary category BoBot
, the number of frames of each secondary category has 1 additional frames than proposed. You can drop the last frame.
Due to this fact, I dropped the last frame during the generation of statistics
Person Part Occluded | Cliff Dive 2 |
ucf50_stats.pkl
contains statistics of UCF-50 Action Recognition Dataset.
The attributes are:
ucf50_list
: consists of 50 class names.[class_name]
: there are 50 lists that contains video names of the dataset. Each list is named as its class name. E.g.BaseballPitch
,Basketball
, etc.
BaseballPitch Group 3 Clip 4 | HorseRace Group 3 Clip 5 |
ucf101_stats.pkl
contains statistics of UCF-101 Action Recognition Dataset.
The attributes are:
ucf101_list
: consists of 101 class names.[class_name]
: there are 101 lists that contains video names of the dataset. Each list is named as its class name. E.g.ApplyEyeMakeup
,BlowDryHair
, etc.
ApplyEyeMakeup Group 4 Clip 3 | CricketBowling Group 1 Clip 2 |
caltech256_stats.pkl
contains statistics of Caltech-256 Recognition dataset.
The attributes are:
caltech256_list
: consists of 257 class names.[class_name]
: there are 257 lists that contains image names of the dataset. Each list is named as its class name. E.g.001.ak47
,005.baseball-glove
, etc.
Fireworks No. 19 |
Mountain bike No. 28 |
Theodolite No. 33 |
Images are saved in binary files. The statistics can be acquired immediately from the dataset.
Few facts:
-
For each
aedat
logging file, you can parse and save as 4 variables: timestamps, X position of the event, Y position of the event and then polarity data -
For polarity, 1 is ON event, means becoming brighter; 0 is OFF event, means becoming darker.
-
The timestamps are labelled in unit of microsecond.
The main challenge is to do pruning and enhancements to recordings.
-
By using VOT statistics, we should be able to figure out the correspondence between VOT frames and DVS frames. [ON THE WAY]
-
I added 2 extra steps to smooth the recording process. The first is displaying static background that fills up entire window. This is to remove the effect the first frame of next video tries to calculate difference between previous last frame. The second is playing the first frame for few seconds before displaying the sequence, this is to remove the big difference brought by the static background. These changes made the sampling more efficiently.
-
Playing step by step is somehow different from playing the recordings in jAER. [It's different]
-
The automatic labeling process considered the fact that all frames are resized to ratio 4:3 for recording purposes.
-
By [name needed]'s suggestion, I now aggregate the frames by using total number of frames, this is a rather simple idea and it works well. But I still need some careful tuning on this method so new frames can match with original frames perfectly.
-
The automatic labeling so far is working reasonable by using my relative position based calculation. However, for the sequences that doesn't match so well, the bounding box will wrongly label few frames.
-
⭐[UPDATE: 2016-03-23] Frame selection is generally working, I still need to test for all recordings. Bounding box labeling is generally working, and I think the major problem is that DAViS240C is not well positioned as much as possible.
-
⭐[UPDATE: 2016-03-24] Frame selection still has some small mistakes. There seems some bad events that give wrong event locations. Bounding box generation is generally good for Tracking Dataset.
-
⭐[UPDATE: 2016-03-26] I tried to record the video in lower frequency (30Hz), and it does improve my labeling.
-
⭐[UPDATE: 2016-04-01] Fixed a major bug during frame generation, now the bounding box is nearly perfect.
-
The above problem may not apply to image dataset.
Below are few examples (Old, new refined examples are coming)
VOT Challenge 2015 | |
bag (mismatched) | bolt1 (matches well) |
gymnastics3 (matches well) | singer1 (no detail info at last) |
Tracking Dataset | |
girl | person part occluded |
All recordings of a certain dataset is saved in one single HDF5 dataset file. All datasets are shipped in HDF5 format for fast access and uniform interface for different computing platforms.
The design principles for a given set of recordings are:
- Each recording is a subgroup in HDF5, and there are at least 4 datasets in this subgroup
timestamps
: saved in int32 in principlex_pos
: (0, 240), saved in uint8 in principley_pos
: (0, 180), saved in uint8 in principlepol
: (0/1), saved in boolean in principlebounding_boxes
: (optional) (num_frames x 8), saved in either float32 (default) or uint8bounding_boxes_timestamp
: (optional) (structure TBD), saved in int32 in principle
- For each recordings, there are multiple meta attributes associated with each subgroup
display_freq
: display frequency while samplingnum_frames
: (optional, for video data only), number of frames in original video
- For recognition dataset, each recording is not directly associated with root group, instead, like saving in folder, a sub-group titled category's name will be created.
- Printing dataset structure before calling data is strongly recommended
While designing the dataset, I tend to preserve as much data as original. And clear description of the dataset will be published.
If you are new to HDF5, please read documentation of h5py
package.
You can find it here. The quick start guide would provide sufficient background
knowledge to the package.
TrackingDataset is stored in HDF5 format. All sequences beside category "Kalal" are encoded.
The overall structure of TrackingDataset:
root
|
|--- Babenko
|
|--- girl
|--- timestamps
|--- x_pos
|--- y_pos
|--- pol
|--- bounding_box
|--- OccludedFace2
|--- ...
|--- surfer
|--- ...
|--- BoBot
|--- ...
|
|--- Cehovin
|--- ...
|
...
|--- Wang
This structure follows exactly as original frame-based TrackingDataset
The metadata associated with root
group are
Attributes | Value | Description |
---|---|---|
device |
DAViS240C | DVS device model used |
fps |
30 | Internal refreshing rate |
monitor_id |
SAMSUNG SyncMaster 2343BW |
Monitor model number |
monitor_feq |
60 | Monitor display rate |
The above attributes described experiment condition and equipment information.
There are total 12 primary group under root
group:
Babenko, BoBot, Cehovin, Ellis_ijcv2011, Godec, Kwon, Kwon_VTD,
Other, PROST, Ross, Thang, Wang
For each primary group, there are several recordings, each recording is a subgroup
of the corresponding primary group. Each recording group has one metadata attribute - num_frames
,
which records number of frames in original frame-based dataset.
For each recording group, there are 5 dataset
s
Dataset | Data type | Description |
---|---|---|
timestamps |
np.int32 |
Timestamps of the recording |
x_pos |
np.uint8 |
X position of the recording |
y_pos |
np.uint8 |
Y position of the recording |
pol |
np.bool |
Polarity information of the recording |
bounding_box |
np.float32 |
Bounding box information of the recording |
IMPORTANT: The first column of the bounding_box
stores some timestamps.
Each timestamp represents that the object appears at a location at the particular frame for the first time.
These timestamps are generated based on the frame generation function available at dvsproc.py
Furthermore, bounding boxes for few sequences are not so well computed due to the fuzziness of the recording. Please use with caution.
UCF-50 is now stored in HDF5 format. The structure of the dataset follows exactly like original dataset.
Under root
group, there are 50 sub-groups which represent 50 different categories in original dataset.
Each category group also consists of number of sub-groups that contains recording's data.
For each recording group, there are 4 dataset
s:
Dataset | Data type | Description |
---|---|---|
timestamps |
np.int32 |
Timestamps of the recording |
x_pos |
np.uint8 |
X position of the recording |
y_pos |
np.uint8 |
Y position of the recording |
pol |
np.bool |
Polarity information of the recording |
The metadata of root
group and each recording group are same as TrackingDataset for now
IMPORTANT: THERE ARE MULTIPLE DAMAGED RECORDINGS IDENTIFIED. THESE RECORDINGS WILL BE RE-RECORDED AND REPLACED IN THE DATASET IN FUTURE.
VOT Challenge Dataset is stored in HDF5 format now. The structure of the dataset follows exactly like original dataset.
Under root
group, there are 60 recording groups. Each group contains recording data of corresponding sequence.
Dataset | Data type | Description |
---|---|---|
timestamps |
np.int32 |
Timestamps of the recording |
x_pos |
np.uint8 |
X position of the recording |
y_pos |
np.uint8 |
Y position of the recording |
pol |
np.bool |
Polarity information of the recording |
bounding_box |
np.float32 |
Bounding box information of the recording |
The metadata of root
group and each recording group are same as TrackingDataset for now
IMPORTANT: THERE ARE MULTIPLE DAMAGED RECORDINGS IDENTIFIED. THESE RECORDINGS WILL BE RE-RECORDED AND REPLACED IN THE DATASET IN FUTURE.
TBD
Yuhuang Hu
Email: duguyue100@gmail.com