serengeti-analysis-scripts

Various PyMongo and other scripts for analysing Serengeti DB

My general approach is that each script will write its output to a pickle file, for use in the next script. This means you can re-run part of the analysis as you tweak the scripts without having to run everything again from scratch.

Order to run the scripts:

Run some tests to see what classifications you have:
1. (optional) run python validate-classifications-by-date-full-iteration.py. This checks the dates of all classifications
2. (optional) run python validate-classifications-by-date-paged.py. This checks the dates of all classifications using a different approach
Create season mapping:
1. First run python get-seasons.py. This generates a mapping of which subject is in each season (subject_season_map.p).
2. (optional) You can now run python validate-subject-season-map.py which will load the season map pickle file then summarise it
3. (optional) You can now run python validate-seasons-present-in-classifications.py. This simply checks which seasons' images have got classifications present, and outputs the results to the screen.
Inspect the mix of users in each season:
1. You can now run python get-user-mix.py. This will scan through the classifications DB, referencing the subject/season mapping, and build up arrays of which users were encountered in each season and how many anonymous users were in each season. This will be written to two new pickle files (known_users.p and anon_users_counts.p).
2. (optional) You can now run python validate-known-users.py which will load the known users file and summarise it.
Generate a CSV summary:
1. You can now run python get-summary.py. This will load the known users and anon users file and generate a CSV containing the new, returning and anonymous users per season.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
console-logs-and-notes		console-logs-and-notes
csvs		csvs
pickle		pickle
LICENSE		LICENSE
README.md		README.md
generate-species-csvs.py		generate-species-csvs.py
get-daily-summaries.py		get-daily-summaries.py
get-seasons.py		get-seasons.py
get-summary.py		get-summary.py
get-talk-user-summaries.py		get-talk-user-summaries.py
get-user-mix.py		get-user-mix.py
get-user-summaries.py		get-user-summaries.py
pick-random-csv-subsets.py		pick-random-csv-subsets.py
validate-classifications-by-date-full-iteration.py		validate-classifications-by-date-full-iteration.py
validate-classifications-by-date-paged.py		validate-classifications-by-date-paged.py
validate-known-users.py		validate-known-users.py
validate-seasons-present-in-classifications.py		validate-seasons-present-in-classifications.py
validate-subject-season-map.py		validate-subject-season-map.py

License

zooniverse/serengeti-analysis-scripts

Folders and files

Latest commit

History

Repository files navigation

serengeti-analysis-scripts

About

Resources

License

Stars

Watchers

Forks

Languages