Matching

To run, fork then open "Course-Industry Matching.ipynb" in ipython notebook. All important functions are explained there.

Analysis

This repository analyzes the likelihood of matching between two independent sets of data (e.g. Course to Industry). The algorithm performs an initial Content-Based Filtering through features in text, with a dynamic capability of Collaborative Filtering through present user profiles.

Such likelihood is quantified using a matrix, where each entry describes the relative likelihood of matching. This is ideal for it is scalable with new data, and it is compatible with multiple criteria likelihood (e.g. Course to Industry to Jobs). One just needs to multiply the respective matrices to acquire a new likelihood relationship.

Algorithm

The steps of the algorithm is as follows:

Data Mining / Data Gathering

Data Cleaning
- text normalization
- prefix removal
- abbreviation mapping
- internal respelling

Clustering
- Uses WORD STEMMING and WORD FREQUENCY

Creation of Likelihood Matrix
- Content-based Filtering
- Uses cosine similarity of features
- Tfdif vectorization of text

Dynamic Update of Likelihood
- Collaborative Filtering
- Uses cosine similarity as well
- Increases likelihood for each new user info (example below)
  - user course: MARKETING
  - user work industry: FINANCE INDUSTRY
  - result: likelihood match of MARKETING and FINANCE increases
- Uses cross product of all possible keyword matches

Repeat of previous step (5)

Python Requirements (through pip)

1) pyenchant
	- with AbiWord Enchant 
2) stemming
3) numpy
4) scipy
5) sklearn
6) pandas

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
.gitignore		.gitignore
Course-Industry Matching.ipynb		Course-Industry Matching.ipynb
Course-Job Matching Analysis (with Database).ipynb		Course-Job Matching Analysis (with Database).ipynb
Industry-Job Matching Analysis (with Database).ipynb		Industry-Job Matching Analysis (with Database).ipynb
LICENSE		LICENSE
Likelihood Test.ipynb		Likelihood Test.ipynb
README.md		README.md
__init__.py		__init__.py
algoutils.py		algoutils.py
algoutils.pyc		algoutils.pyc
analysis.py		analysis.py
analysis.pyc		analysis.pyc
cleaning.py		cleaning.py
cleaning.pyc		cleaning.pyc
clustering.py		clustering.py
clustering.pyc		clustering.pyc
edistance.py		edistance.py
edistance.pyc		edistance.pyc
requirements.txt		requirements.txt
suffix.py		suffix.py
suffix.pyc		suffix.pyc

License

Hikari9/Matching

Folders and files

Latest commit

History

Repository files navigation

Matching

Analysis

Algorithm

Python Requirements (through pip)

About

Resources

License

Stars

Watchers

Forks

Languages