CS1980 Data Mining for Student Scheduling

Project Group: Alex Kim, Matt Darden, Ruth Dereje

Project Supervisors: Dr. Daniel Mosse, Nathan Ong

PROJECT INTRODUCTION

At the University of Pittsburgh, there is a wide variety of options available for undergraduate students to schedule their courses. In the current system, students are often left to select courses through an extensive list of classes with only the knowledge of basic general education requirements and classes required for their major. In order to create their schedule, students must schedule meetings with their school advisor. However, advisors are limited to screenshots of other schedule building software and prior knowledge. The decision process can oftentimes be overwhelming for students to choose from, as well as for advisors trying to find the best options for students. There are many factors such as professors, course difficulty, work load, prerequisites, and others to consider when scheduling for classes that make the process far more difficult than choosing a predetermined or generic suggested path.

Our goal is to analyze student data within the computer science major to build a tool that will find correlations pertinent to helping students and advisors predict the best path to academic success.

HYPOTHESIS 1

if students receive a lower than average grade in CS 401 will receive lower than average grades in CS 445.

HYPOTHESIS 2

If a student has a semester buffer between a low level course (CS 445) and its corresponding upper level course (CS 1501), then their score will be lower than a student with a similar grade who took the upper level course the semester after the prerequisite.

HYPOTHESIS 3

If a student has taken CS 447 then they will receive a higher than average grade for CS 449 than a student who has not taken CS 447.

USAGE GUIDE

FORMATTING THE DATA

format the provided data files:
1. delete the header row on each file
2. export each file as a .csv file
create a folder 'anonymized_data'
within the folder create another folder called 'csv'
add formatted data to the 'csv' folder

ADDING TESTS BETWEEN CLASSES

add file 'classes.txt'
1. add tests in the following format class1_department class1_number class2_department class2_number test_type
  - Ex: 'CS 0401 CS 0445 grades'
  - test_type can either be 'grades' or 'time'
    - 'grades' will show the correlation between the grades of the two classes, if the grade from class1 impacts the grade from class2
    - 'time' will show if the number of semesters(0-4) between class1 and class2 has an impact on the grade of class2

HOW TO RUN ALL SCRIPTS

in command line
type chmod +x run_all.sh and enter
type ./run_all.sh and enter
when prompted for a database for hypothesis_one: type databases/capstone.sqlite and enter
when prompted for a database for hypothesis_two: type databases/capstone.sqlite and enter

output for all three hypotheses should appear in the command line

HOW TO RUN ALL TEST SCRIPTS

in command line
type chmod +x run_tests.sh and enter
type ./run_tests.sh and enter
when prompted for a database for hypothesis_one: type databases/unittest1.sqlite and enter
when prompted for a database for hypothesis_two: type databases/unittest2.sqlite and enter

output for tests on hypothesis_one and hypothesis_two should appear in the command line

HOW TO RUN TESTS FOR OTHER CLASSES

add all tests to the 'classes.txt' file that you want to run following the above format
in command line
type python class_correlations.py and enter

output for each test preceded by the line that output those results should appear in the command line

FILE DESCRIPTIONS

databases Folder - holds all of the sqlite databases and the python scripts to create the databases
- db.py - creates the main capstone.sqlite database that all hypotheses query from
- test_db1.py - creates the unittest1.sqlite database for testing hypothesis_one
- test_db2.py - create the unnittest2.sqlite database for testing hypothesis_two
midterm test data folder - contains fake data that we created to test our skeleton code for our midterm progress presentation
old folder - contains old skeleton code that we used before we received the official data
testdata folder - contains fake data used for unit testing purposes
hypothesis_one.py - code that implements a solution to our first hypothesis for the project
hypothesis_two.py - code that implements a solution to our second hypothesis for the project
hypothesis_three.py - code that implements a solution to our third hypothesis for the project
test_hypothesis_one.py - unit tests for hypothesis_one implementation
test_hypothesis_two.py - unit tests for hypothesis_two implementation
run_all.sh - shell script that creates the database and runs all the hypotheses
run_tests.sh - shell script that creates the test databases and runs all the unit tests for hypothesis 1 and
corrstats.py - Functions for calculating the statistical significant differences between two dependent or independent correlation coefficients. Author: Philipp Singer (www.philippsinger.info)
class_correlations.py - code that will run the properly formatted tests listed in the classes.txt

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.idea		.idea
__pycache__		__pycache__
databases		databases
midterm test data		midterm test data
old		old
testdata		testdata
.gitignore		.gitignore
README.md		README.md
class_correlations.py		class_correlations.py
corrstats.py		corrstats.py
hypothesis_one.py		hypothesis_one.py
hypothesis_three.py		hypothesis_three.py
hypothesis_two.py		hypothesis_two.py
run_all.sh		run_all.sh
run_tests.sh		run_tests.sh
test_hypothesis_one.py		test_hypothesis_one.py
test_hypothesis_two.py		test_hypothesis_two.py
user_stories.txt		user_stories.txt

mattdarden/CS1980

Folders and files

Latest commit

History

Repository files navigation

CS1980 Data Mining for Student Scheduling

Project Group: Alex Kim, Matt Darden, Ruth Dereje

Project Supervisors: Dr. Daniel Mosse, Nathan Ong

PROJECT INTRODUCTION

Our goal is to analyze student data within the computer science major to build a tool that will find correlations pertinent to helping students and advisors predict the best path to academic success.

HYPOTHESIS 1

HYPOTHESIS 2

HYPOTHESIS 3

USAGE GUIDE

FORMATTING THE DATA

ADDING TESTS BETWEEN CLASSES

HOW TO RUN ALL SCRIPTS

HOW TO RUN ALL TEST SCRIPTS

HOW TO RUN TESTS FOR OTHER CLASSES

FILE DESCRIPTIONS

About

Resources

Stars

Watchers

Forks

Languages