Final project for the AstroML course, based on data from TDC1 Page of the challeng, including data: http://timedelaychallenge.org/
The intermediate data files are too large for GitHub. Please write to saghar.asadi@astro.su.se if you don't want to spend the CPU time to regenerate them all from the raw data in /tdc1/
- Input: tdc0/, tdc1/
- Output: TimeDelayData/pairs_with_truths.npz
Loads all the data into one large numpy array, merging the ground truth and the source data and adding a column with a globally unique id for each light curve pair.
- Input: TimeDelayData/pairs_with_truths.npz
- Output: TimeDelayData/pairs_with_truths_and_windows.npz
Calculates the sampling window boundaries and assigns a globally unique id to each sampling window. This is done by comparing the interval between two consecutive points with the average interval between two such points.
- Input: TimeDelayData/pairs_with_truths_and_windows.npz
- Output: GPModels/*
Trains a Gaussian Process model for each light curve in each sampling window, and serializes the model using pickle.
- Input: TimeDelayData/pairs_with_truths_and_windows.npz, GPModels/*
- Output: TimeDelayData/gp_resampled_of_windows_with_truth.npz
Interpolates evenly sampled data for each light curve in each window, using the trained Gaussian Process models from GP.py
- Input: TimeDelayData/gp_resampled_of_windows_with_truth.npz
- Output: timeshift_{correlate,ng_correlate,mse}_normalized[_detrend].npz
Compares the two light corves at all possible offsets using different methods. The light curves can optionally be detrended before comparison.
- Input: timeshift_{correlate,ng_correlate,mse}_normalized[_detrend].npz
- Output: timeshift_{correlate,ng_correlate,mse}_normalized[_detrend].measures.npz
Finds the maximum correlation or minimum mse and records its magnitude and offset for each window.
- Input: timeshift_{correlate,ng_correlate,mse}_normalized[_detrend].measures.npz
- Output: timeshift_{correlate,ng_correlate,mse}_normalized[_detrend].measures.pairsums.npz
Calculates the mean, median and stddev of the maximum correlation offsets of all windows for a light curve pair.
- Input TimeDelayData/timeshift_correlate_normalized_detrend.measures.pairsums.npz
- Output TimeDelayData/timeshift_correlate_normalized_detrend.measures.pairsums.ests.npz
Estimate dt from median maximum correlation offset and standard deviation of maximum correlation offset.
- Input: TimeDelayData/pairs_with_truths_and_windows.npz TimeDelayData/timeshift_correlate_normalized_detrend.measures.pairsums.ests.npz
Measure the error of the dt estimations
python measure_window_corelations.py TimeDelayData/timeshift_correlate_normalized_detrend.npz true TimeDelayData/timeshift_correlate_normalized_detrend.measures.npz
python measure_window_corelations.py TimeDelayData/timeshift_correlate_normalized.npz true TimeDelayData/timeshift_correlate_normalized.measures.npz
python measure_window_corelations.py TimeDelayData/timeshift_mse_normalized_detrend.npz false TimeDelayData/timeshift_mse_normalized_detrend.measures.npz
python measure_window_corelations.py TimeDelayData/timeshift_mse_normalized.npz false TimeDelayData/timeshift_mse_normalized.measures.npz
for x in TimeDelayData/timeshift_*.measures.npz; do
python sum_pair_corelation_measures.py $x TimeDelayData/$(basename $x .npz).pairsums.npz;
done
for x in TimeDelayData/*.pairsums.npz; do
python cluster.py $x TimeDelayData/$(basename $x .npz).clusters.npz;
done
Columns in 'pairs_with_truths_and_windows.npz':
-
From the original light curve pair files:
- time, lcA, errA, lcB, errB
-
From the path of the light curve pair files
- tdc - numerical part of tdc directory name
- rung - numerical part of rung directory name
- pair - numerical part of pair filename. Note: pairs in quad systems (A, B) have 0.0 added for A and 0.5 added for B.
-
From the corresponding truth file (duplicated for all rows in the corresponding light curve)
- dt, m1, m2, zl, zs, id, tau, sig
-
Computed values
- full_pair_id - a (globally) unique id for each combination of tdc, rung, pair
- window_id - a (globally) unique id for each sample window in the light curves