Skip to content

A Python utility for evenly downsampling polymorphisms from a population of sequences.

License

Notifications You must be signed in to change notification settings

ericmjl/polymorphism-sampler

Repository files navigation

polymorphism-sampler

A Python utility for evenly downsampling polymorphisms from a population of sequences.

Procedure

  1. Start with a sequence alignment.
  2. Collate together all positions that show polymorphisms, i.e. not 100% conserved.
  3. Randomly pick one position.
  4. At that position, randomly pick one of the polymorphisms.
  5. Filter out sequences such that we are left with those that have that polymorphism at that position.
  6. Randomly pick one sequence out.
  7. Figure out which other polymorphisms are covered by that sequence, and remove them from consideration.
  8. Add the chosen sequence to a collated set, and remove it from further consideration.
  9. Repeat until:
    1. No more polymorphisms need to be found.
    2. No more sequences are available.

About

A Python utility for evenly downsampling polymorphisms from a population of sequences.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages