Software to simulate and analyse Age Structured PopGen Data
Linux (64-bit)
Python 2.7
simuPOP
Biopython
Genepop (1)
LDNe (1,2)
pygenomics
matplotlib
seaborn
1 - Only if you do genetic analysis
2 - Binary is supplied here
There are 3 parts to the analysis
- Data simulation
- Non-genetic analysis
- Genetic analysis
You might want to skip 2 or 3 (e.g to repeat the Evolution paper, you do not need genetic analysis)
Some of these calculations can be very slow or very computationally demanding...
You can start by running the simulator on one scenario, for example:
python sim.py ldne/100grizzly.conf example
Will run an age structured population of Grizzlies with a N1 of 100 and write the output on example.* files
This will generate 2 files:
example.sim - non-genetic output (generation, replicate, individual id sex and parent id)
example.gen - the genome of each individual
Typically you will want to run several replicates. For that you can run
bash doSim.sh 100grizzly 0 20 ldne
This will do 20 replicates (0 to 19) and store them on data/ldne/100grizzly* . Ignore the rm errors.
If you desire you can run replicates in parallel (i.e. run several instances of the above, see e.g doSimAll.sh which runs 20 replicates - 10 at a time). You will need to have enough CPU power and memory for this!
Generate non-genetic raw information (vk, offspring count, ...)
python totalAll.py ldne/var-100grizzly.conf
(see ldne/var*conf for examples)
Generate summary statistics
python demo.py ldne/var-100grizzly.conf > demo.txt
python nb3.py ldne/var-100grizzly.conf > nb3.txt
These files can be loaded into Excel (space delimited)
Heterozygosity
Do:
bash doH.sh 100grizzly 0 20 ldne 100 0
0 20 are the start and end replicates (0 to 19)
ldne is the directory name
100 is the number of generations
0 is the loci to start the analysis (0 - 100 are normally MSats, 100 onwards are SNPs). 100 loci will be used to compute H.
This will generate hz computations on data/ldne/1000grizzly-*hz
The models on the directory hz simulate 600 years, probably more appropriate to test Hz than the standard models (100 years)
LDNe estimates
Do:
python totalAll.py trout/var.conf gen
This will run LDNe for the scenarios in trout/var.conf. Configuration for the sampling strategy will be done in trout/topGo.sh
[This assumes LDNe was run]
python nb3.py trout/var.conf > output/trt-nb3.txt 2> output/trt-nb.txt
Heterozygosity levels on the simulations need to be computed BEFORE the main study is done. After hz do:
python sumTrout.py trout/var.conf trt
Errors starting with err can be ignored. There will be an exception at the end.
Run: python analyzeHz.py trout
Now do: python sumTrout.py trout/var.conf trt
Notebooks are now ready to be used.
And now all should be fine (again, ignore the err messages). This is convoluted but works!