This is the supplementary data and code for the paper:
Panser K, Tirian L, Schulze F, Villalba S, Jefferis GSXE, Bühler K, Straw AD. (2016) Automatic segmentation of Drosophila neural compartments using GAL4 expression data reveals novel visual pathways. Current Biology.
Please check our website at https://strawlab.org/braincode for updates and an interactive data browser. Updated versions of this source code may be found at https://github.com/strawlab/braincode.
This directory contains three subdirectories:
braincode
- the source codeclustering-data
- the results of the published clusterings and agglomerations3d-models
- models of the segmented VPNs, optic glomeruli and so on.
There are three types of data files included in the clustering-data
directory:
- Samples HDF5 file (
data/<dataset>/<region>/<region>_samples.h5
) - Clustering result volume file (
data/<dataset>/<region>/<clustering_type>/<region>_clusterimage.nrrd
) - Cross-reference file (
data/<dataset>/id_driver_image.csv
)
These .h5 source files are in the HDF5 format. They contain the expression data used for the clustering. They have been subsampled from the original registered confocal stacks and have been taken from the brain region on which the clustering was performed.
Internally, the structure of each .h5 file is
/
├── ids
│ ├── 247_217_115
│ ├── 247_217_118
│ └── <x>_<y>_<z>
├── positionKeys
├── size
├── stepXY
└── stepZ
The ids
group contains many individual datasets named <x>_<y>_<z>
,
where <x>
, <y>
, and <z>
are the coordinates of the sampled
voxel. Each dataset is a list of ids whereby each id corresponds to a
particular confocal stack and driver line. The criterion for an id to
be listed is that the expression in that stack must exceed a threshold
value in that voxel.
The positionKeys
dataset is a comma separated string of all
considered positions within the given region. The coordinates are not
in the downsampled space.
The size
dataset species the number of voxels in the original
(not-downsampled) volume.
The stepXY
and stepZ
datasets specify the amount by which the
resulting clustering result volume (.nrrd
file) has been
downsampled.
These .nrrd files contain the volumetric coordinates of each cluster identified by the clustering algorithm in the NRRD format. Each voxel is assigned zero (for no cluster) or an integer that defintes the cluster number to which the voxel belongs.
This file contains the identity between an integer id, the file name of a confocal stack corresponding to that id, and the driver line from which the confocal stack was made.
kmedoids_salspaugh.py
The kmedoids algorithmdice.py
The dice coefficient algorithmcalculate_distance.py
Computes the voxel-to-voxel distance matrix for a given brain regionperform_clustering.py
Run the kmedoids clustering algorithmutil.py
Various utilitiesplot_distance_matrix.py
Create a plot showing the voxel-to-voxel distance matrix for a given clustering resultfragments_per_cluster_step1_compute.py
Compute which driver lines are expressed in which clustersfragments_per_cluster_step2_save_csv.py
Save CSV file with the data per driver linefragments_per_cluster_step3_csv_to_json.py
Convert CSV file with all rows to a JSON file with only significant rowscalculate_cluster_stats.py
Measure distances between medoids (inter-cluster) and between voxels in cluster (intra-cluster)save_cluster_info_json.py
Save cluster statisticsstability.py
Evaluate the stability of clusterings