group_2

CS122 group project

Summary: This project analyzes gender disparities across academic disciplines and institutions. We scrape authorship data from journal websites and article search engines, assign gender to author names, and analyze the distribution of gender in specific academic disciplines, institutions, and geographic regions.

To run website: You run this file runing "google-chrome website.html" in the terminal.

Group Members: Lily Mansfield, Peter Laurin, Maxwell Kay, Sadie Morriss

Note: Our scripts were reorganized for clarity's sake and have been organized such that running our scripts will not overwrite final versions of our final database (journals.db) and our generated html files for our pyplots that are used by the website. If scripts are run they will recreate our work in files/directories marked by the prefix "dummy" (i.e. if you attempt to rerun our database scripts they will start building a new database called dummy_journals.db). We have also added notes to the top of our scripts restating this information.

Code description:

make_database/ : This directory contains scripts used to generate and clean our final database of authors and papers (journals.db). To recreate our database:

create_database.py: create the SQL database with three tables (authors, papers, author_key_rank)
nature_crawler.py: crawls the Nature journals, adding to the SQL database
PLOS_webcrawler.py: crawls the PLOS One journal, adding to the SQL database
database_cleaning.py: cleans the database (rewrite different spellings of top countries. remove "the" from institutions, etc.)

make_plots/ : This directory contains scripts used to generate pyplot graphs and to generate the html files for the graphs for the website based on our final journals.db database. We created html files for all the pyplots based on the selection criteria from the website. We had filtering for country, institution, field, rank (place in order of authors). If country and institution are NOT selected (default 'all' is selected) then you can also filter for minimum number of authors (graph will only show institutions or countries with more than 10, 50, 100 or 500 authors). To recreate html files for our graphs:

create_html_plots.py: creates html files of graphs for all possible filtering options for the website to display. Uses:
- plot_functions.py: functions to create plots for gender breakdown by country, gender breakdown by field, gender breakdown by institution, gender breakdown by rank.

make_website/ : This directory includes the files used to crease the website. First, we found the names of all the files in pyplot_htmls_final via the terminal and saved that in plots.txt. Make_js_plots_final.py took plots.txt to iterate through the html files in plots_html_final, extract the javascript portion of the html, and write a javascript file that includes each plot javascript as a function. We called the resulting javascript file java_functions_final, added an additional function, and then referenced this file through website.html. -plot.txt: a text file with the names of all the files in pyplot_htmls_final (at the time we made it) -make_js_plots_final.py: writes javascript file as described above -java_functions_final.js; javascript functions (on_press() and one function per plot) that are used by website.html

GENERAL FILE LIST

Main directory:

website.html: the html code for the website, with embedded javascript

journals.db: the database of author information, paper information, and author rankings in each paper

Directories/files:

make_database/: scripts to recreate journals.db

nature_crawler.py: a python script that crawls Nature journals and extracts authors and authorship information
PLOS_webcrawler.py: a python script that crawls PLOS_ONE and extracts authors and authorship information
gender.py: a utility function that assigns gender to authors
create_database.py: a script that sets up our database of author information, paper information, and author rankings in each paper
database_cleaning.py: a set of utility functions to clean data and make database more accurate

make_website/: scripts that made javascript files used by website.html

make_js_plots_final.py: takes html for plots and extracts the javascript from them, writes a javascript file with each plot as a function, changes the div ids of the plots to match div ids from website.html
java_functions_final.js: includes helper javascript functions for the website.html, including every plot as a javascript function
plots.txt: the names of the files in pyplot_htmls_final

make_plots/: scripts to create pyplots and their html files based on journals.db

create_html_plots.py: a script that iterates through all possible combinations of filters (country, field, rank, institution, minimum number of authors) from our website dropdown menus and generates hmtl code for each plot
plot_functions.py: a script that generates pyplots breaking down gender percentages based on country, field, rank and institution

regression/: exploratory regression analysis

author_regression.ipynb: a jupyter notebook regressing gender on some of our data and getting a priority of data to visualize on the website

pyplots_html_final/: html files for all graphs to be displayed on website based on user filtering

test_code/: code used to test and build various scripts

project_proposal/: project proposal documents

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make_database

make_database

make_plots

make_plots

make_website

make_website

project_proposal

project_proposal

pyplot_htmls_final

pyplot_htmls_final

regression

regression

test_code

test_code

README.md

README.md

journals.db

journals.db

website.html

website.html

Repository files navigation

group_2

GENERAL FILE LIST

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
make_database		make_database
make_plots		make_plots
make_website		make_website
project_proposal		project_proposal
pyplot_htmls_final		pyplot_htmls_final
regression		regression
test_code		test_code
README.md		README.md
journals.db		journals.db
website.html		website.html

peterlaurin/group_2

Folders and files

Latest commit

History

Repository files navigation

group_2

GENERAL FILE LIST

About

Resources

Stars

Watchers

Forks

Languages