Seng 474 Data Mining Project

Team

Alastair Beaumont
Kolby Chapman
Graeme Nathan
Cole Peterson

Data

Data We Have

Home team (str)
Away team (str)
Home team's score (int >= 0)
Away team's score (int >= 0)
Match winner (str):
- H: home team
- A: away team
- D: draw
Date of match (date - dd/mm/yy)

Data We Want

Relegation score (float in (0, 1))

Relegation Score

This is calculated as 1 - (pts/team_max). It is a ratio in the range (0, 1) where the higher a team's score the closer they are to being relegated.

The team_max is 3 times the number of games that time has played, representing the theoretical maximum score the team could have at that point in time.

Plan

We will create a baseline algorithm to use as a baseline comparison against a machine-learning approach.

A Baseline Algorithm

We will implement prediction algorithm that always predicts the outcome of the upcoming match as the mode of the previous X matches of each team. This will serve as a comparision, along with the theortical randomized guessing rate of 33%, for our machine learning implementations.

If both teams in a match are predicted to have the same outcome as one another, then this will be considered a prediction of a draw.

Machine Learning

We will take two similar approaches in this area.

First, we will use our data without the relegation score. Second, we will repeat the process with the relegation score. In both cases, we will use one season as training data and another season as testing data.

General Idea

We will investigate whether or not a machine learning approach is any better than randomly guessing the outcome. Additionally, we are going to check whether or not a team's closeness to relegation has an impact on their playing. Should we find that such an affect does appear to occur, we will investigate the threshold where the affect starts.

To Do

Build baseline prediction and calculate its accuracy for different lengths of history set
Redefine our closeness to relegations stat to include the theoretical remaining points a team could achieve so we can measure the closeness to relegation more accurately.
- thinking is that, if you only have a few more chances to make up enough points to leave the relegation set, then you are closer to relegation.
Define what measures to use when checking if a team is playing 'better'; is it simply number of goals scored?

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
Latex Files		Latex Files
MiningCode		MiningCode
.gitignore		.gitignore
MarginOfVictory.png		MarginOfVictory.png
README.md		README.md
proposal_feedback.txt		proposal_feedback.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latex Files

Latex Files

MiningCode

MiningCode

.gitignore

.gitignore

MarginOfVictory.png

MarginOfVictory.png

README.md

README.md

proposal_feedback.txt

proposal_feedback.txt

Repository files navigation

Seng 474 Data Mining Project

Team

Data

Data We Have

Data We Want

Relegation Score

Plan

A Baseline Algorithm

Machine Learning

General Idea

To Do

About

Releases

Packages

Contributors 3

Languages

a1astair/DataMining

Folders and files

Latest commit

History

Repository files navigation

Seng 474 Data Mining Project

Team

Data

Data We Have

Data We Want

Relegation Score

Plan

A Baseline Algorithm

Machine Learning

General Idea

To Do

About

Resources

Stars

Watchers

Forks

Languages