Identifying star clusters in Gaia archive using friend of friend (FoF) method and isochrone fitting.
There are 3 directories in this repo.
-
data/
:K13.dat
: Kharchenko et al. (2013)CG18.txt
andCG18b.txt
: these two catalogues together compose of the CG18 catalog.Bica.txt
: Bica et al. (2019)cal_all.txt
: full version of Tab. 1 in the paper.cat_new.txt
: full version of Tab. 3 in the paper.group/
: cluter members of 56 cluster groups.
Note:
cat_all.txt
,cat_new.txt
andgroup/sc_groupXXXX.txt
can be loaded withpandas
, e.g.:df = pd.read_csv("cat_all.txt", delim_whitespace = True, header = 0)
fof/npy
: star members of 2443 star cluster candidates, in.npy
format, can be loaded witharr = np.load()
. Note! These files might be demaged if you download them individually from this repo via web browser. I strongly recommend you down the whole repo viagit clone
if you want to use the.npy
format.fof/csv
: star members of 2443 star cluster candidates, in.csv
format, can be loaded withdf = pd.read_csv()
. Safe for individual downling with web browser.isochrone/0-10.dat
:.dat
files that contain the isochrone tables downloaded from Padova group web interface.isochrone/Z_python3.npy.tar.gz
: theZ.npy
data file. Provided in compressed format. Please uncompress it before loading:tar zxvf Z_python3.npy.tar.gz
.fit_iso/
: isochrone fitting results of 2443 cluster candidates. For Python 3:fit = np.load(name_fit_iso, allow_pickle=True, encoding='latin1').flat[0]
. For Python 2:fit = np.load(name_fit_iso).flat[0]
.
-
figure/
:4panel/
: 4-panels of 2443 star cluster candidates.
-
src/
: the GAIA SHiP pipeline, see below.
Note:
-
If you make use of SHiP in your work, we require that you quote the pipeline link
https://github.com/liulei/gaia_ship
and reference the following paper:Liu, Lei & Pang, Xiaoying, "A catalog of newly identified star clusters in GAIA DR2", 2019, ApJS, 245, 32, arXiv:1910.12600
-
Most of the programs are developed with Python 2.7.13 (provided by conda 4.5.8). If you use Python 3, please pay a attention to the following suggestions:
- Change
print " "
toprint(" ")
. - Pay attention to the difference between
/
(float division) and//
(integer division). - If failed with
np.load()
, try this:np.load(xxx.npy, allow_pickle=True)
. See additional notes for loading iso fitting files.
- Change
-
According to feedbacks from colleagues, star members in
.npy
format might be demaged and not be loadable withnp.load()
if you download them individually from this repo via web browser. We strongly recommend you download the whole repo viagit clone
if you want to use the.npy
format member list. For convenience, I have prepared the member list in csv format. This guarantees the safe downloading via web browser with a small loss of precision. -
The whole pipeline (including the data and figure) is as large as 3 GB, which is actually not easy to download from GitHub. For conveniece, I have prepared the
src.tar.gz
, just in case you are only interested with the code. If you want to run isochrone fitting, an extra fileZ_python.py.tar.gz
indata/isochrone/
is required. -
Current SHiP pipeline includes the data preparation, FoF, isochrone fitting and classification parts, so that you may construct the same catalog presented in the above paper. The data visualization part is not provided, since the programs are not well documented and the writings are messy. However they are still available upon request.
-
The installation of
mpi4py
is a little bit tricky. Fortunately it is not required if you use the core functions in single process mode, e.g. ischrone fitting, FoF clustering, see below for detailed instructions. -
The FoF clustering part and isochrone fitting part are the core of this pipeline. For those want to integrate them into your own pipeline, more detailed instructions are provided for the corresponding file below (
isochrone.py
andprocfof.py
). -
Due to the file size limitation set by GitHub (< 100 MB),
Z.npy
(~ 129 MB) cannot be uploaded directly. I provide the compressed version 'Z_python3.npy.tar.gz'. You may download it and uncompress it withtar zxvf Z_python3.npy.tar.gz
. I only provideZ.npy
forGAIA
system. To generate for your own photometry system, prepare.dat
files in Padova webpage, useload_dat()
andsave_npy()
inisochrone.py
. -
You may use
npy2csv.py
to convert the npy format member list of every individual SC candidate to csv format which is readable by topcat.
Feel free to contact me (e-mail: liulei@shao.ac.cn
, wechat: thirtyliu) if you have any problem.
Input:
- Raw GAIA DR2 data (
GaiaSource_*_*.csv.gz
file). ali-gaia_dr2_source.txt
: listing all.csv.gz
files.
Output:
gaia_segxxxx.npy
: 200 files, stored in array of structuredtype_gaia
.
Description:
- Retrieve position, proper motion, magnitude, error information from raw data, and store them in 200
gaia_seg
files. Seedtype_gaia
for details.
Input:
gaia_segXXXX.npy
: 200 files.
Output:
selXXXX.npy
: 200 files, stored in array of structuredtype_select
, which consists of two parts:idx
: star index ingaia_segXXXX.npy
param
: 5 parameters: l, b, parallax, pmra, pmdec
Description:
- Retrieve all stars that satisfy the following criterion:
- mag_G < 18
- 0.2 < parallax < 7.0 (in mas)
- |pmra| < 30 and |pmdec| < 30 (in mas/year)
- |b| < 25 degree
- Positions have been evolved from epoch 2015.5 to epoch 2000.0 according to proper motion.
Input:
selXXXX.npy
: 200 files.
Output:
gaia_partition.npy
Description:
- This serial program equal partitions the 3D parameter space (l, b, parallax) continuously, such that each partition contains roughly the same number of stars.
- The typical scale of star cluster (
l_sc
) and the dispersion of parallax (sigma_plx
) determine the minimum size of the partition in the corresponding dimension. See the program for the values.
Input:
gaia_partition.npy
: partition table, generated bypartition.py
.selXXXX.npy
Output:
stars_in_segxxxx.npy
: 200 files, each row is a list that contains star indexes for the corresponding partition.
Description:
- Assign stars to the corresponding partition according to partition table in
gaia_partition.npy
.
This pipeline is originally designed for the data processing of large number of particles. Therefore every step takes parallalization into account, which makes it complicated. Now I realize that most of cases, users only cares about a single sky area, which greatly simplifies the usage. Below is the intruction for FoF clustering usage for a single sky area.
- The FoF method can only identify cluster from a smooth background. Therefore it might not work well if your sky area is too large.
- The entry for FoF clustering is
runfof()
. You may find instructions on how to prepare data in example functionload_stars_single()
. - Some variables for controlling the FoF clustering:
self.w
: the weight of every quantity. Adjust them according to the feature of your data.self.b_fof
: the linking length. Usually 0.2 times the mean partile seperation.n_star_min
: a group is output with at least this number of stellar numbers. Change it according to your usage.
- The output of the
fof()
function isginfos
andl_keys1
. Seeoutput
below for instructions. For single sky area FoF clustering, you may care about the star member list of each group inl_keys1
.
Input:
stars_in_segXXXX.npy
selXXXX.npy
Output:
ginfos_pXXXX.npy
: 2D floating array. Each row is a list that gives the basic information of a star cluster identified in this partition.- Meaning of each element in the list: cluster star number, l, b, r_max, pmra, pmdec, r_pm, parallax, r_parallax.
keys_sel_pXXXX.npy
: each row is a list that contains all keys for that cluster. Each key is a 64 bit integer:- bit 0 - 31: star index in
selXXXX.npy
- bit 32 - 63: file index for
selXXXX.npy
- bit 0 - 31: star index in
Description:
- For each partition, the program identifies a series of star clusters using FoF algorithm, and save the basic info of the cluster and keys to the specific stars in
ginfos_pXXXX.npy
andkeys_sel_pXXXX.npy
, respectively. - Note! Star clusters given in this step are just intermediate results for each partition. They need to be further merged by
mergefof_key.py
as final product.
Input:
ginfos_pXXXX.npy
keys_sel_pXXXX.npy
Output:
ginfos_merge.npy
: basic info of each cluster after mergekeys_seg_cluster.npy
: keys (file and star index) of each cluster forgaia_segXXXX.npy
Description:
- This program merges cluster from 4311 partitions. At present the merge is just based on the intersection of keys (
keys_sel_pXXXX.npy
) from two clusters. Seeis_merge()
function for more details.nmin
is set to 50 to select those with at least 50 members after merge.
Input:
keys_seg_cluster.npy
gaia_segXXXX.npy
Output:
fof_scXXXX.npy
Description:
- This program retrieve stars from
gaia_segXXXX.npy
according to keys of each cluster recorded inkeys_seg_cluster.npy
. - Further analysis of star clusters will be totally based on
fof_scXXXX.npy
.
Input:
fof_scXXXX.npy
Z.npy
: isochrones of multiple Z and ages
Output:
fit_iso_scXXXX.npy
Description:
- This program reads
g
andb-r
info fromfof_scXXXX.npy
, fits isochrones to derive Z and age. The fitting is carried out by minizing$\bar{d^2} = \sum_{k = 1}^{n}(x_k - x_{k, nn})^2 / n$ , where$x_k = (b-r, g)$ is color and magnitude of the$k~\mathrm{th}$ cluster member,$x_{k, nn}$ is the nearest neighbor of$k~\mathrm{th}$ member in the isochrone.
This file provides all the necessary functions to for isochrone fitting. Below is a summary of its usage.
-
The main function for isochrone fitting is
fit_age_Z()
. An example is provided by fit():iso = ISO() iso.load_npy('Z_python3.npy') d_fit = iso.fit_age_Z(g, b_r)
g
and b_r
are numpy arrays for photometry. d_fit
is a dict that stores the fitting result. You may load it with np.load(fit_iso_name, allow_pickle=True).flat[0]
. Pay attention to the extra encoding='latin1'
param if you load our fitting results with Ptython 3.
-
Originally I expect the users generate their
Z.npy
usingload_dat()
andsave_npy()
for their own photometry data. However it seems more convenienet to provideZ.npy
directly. Please download it fromdata/isochrone/Z_python3.npy.tar.gz
and uncompress it withtar zxvf Z_python3.npy.tar.gz
. -
As mentioned in the paper, there is a magnitude cut
g < 17
for bright stars. You may change or discard this criteria. -
For efficiency, the fitting only deals with a maximum of 1000 stars, which is enough for most of star cluster cases. This is controlled by
nmax
. You may change or discard this criteria. -
The fitting is only tested on apparent magnitude. You may try with absolute magnitude. If you succeed, please let me know.
-
You may want to plot the fitting result. I provide this function in
showisofit.py
. The input isg
,b_r
and the name of your fitting result file. Or just modify it to pass the fittign result dict. The problem is pretty simple.
Below is old information on generating Z.npy
for your own photometry system.
Input:
.dat
files that contain isochrone information downloaded from Padova group web interface (http://stev.oapd.inaf.it/cgi-bin/cmd).
Output:
Z.npy
: metallicity and age table generated from.dat
files.
Description:
The ISO
class in this file provides the following functions:
load_dat()
andsave_npy()
: generateZ.npy
from.dat
files.
Input
:
-
fit_iso_scXXXX.npy
: isochrone fitting result ($t_\mathrm{age}$ ,$\bar{d^2}$ ) -
fof_scXXXX.npy
:g
andb-r
for narrowness ($r_\mathrm{n}$ ) calculation,$n_{g<17}$
Output
:
-
sc_info.txt
: id_sc, ntot,$n_{g<17}$ ,$\bar{d^2}$ ,$r_\mathrm{n}$ ,$Z$ (logMsol),$t_\mathrm{age}$ (in Gyr),$\Delta_g$ ,$\Delta_{b-r}$ ,$d_\mathrm{plx}$ ,$d_\mathrm{iso}$ , classification
Description
:
This program reads sc info and classifies them.
Input
:
ginfos_merge.npy
: position and radius of star clusterssc_info.txt
: classification infoK13.dat
(K13, Kharchenko 2013)CG18.dat
andCG18b.dat
(CG18, Cantat-Gaudin 2018a,b)B19.dat
(B19, Bica et al. 2019)
Output
: cross matching catalog
- col 1: line No. (0 indexed) in `ginfos_merge.npy`
- col 2: position seperation in two catalogs
- col 3: radius in `ginfos_merge.npy`
- col 4: radius in reference
- col 5: line No. (0 indexed) in K13, CG18 and B19
- col 6: classification
Description
:
- This program cross matches the FoF sc of this work with K13, CG18 and B19. Two clusters are regarded as matched if their distance is smaller than both of their radii.
Input
:
cat_all.txt
Output
:
sc_groupXXXX.txt
: cluster members of a group.sc_groups.txt
: number of clusters in each group.
Description:
- This program reads all the class 1 cluster candidates, identifies cluster groups with standard FoF method, outputs groups that contain at least 2 members.
Note:
- In this program and only in this program, we use the
$d$ ,$l$ ,$b$ to$X$ ,$Y$ ,$Z$ conversion described by Eq. 3 of Conrad et al. (2017). This is different from the commonly used conversion adopted in our paper (Fig. 8).