Skip to content

Hikari9/Matching

Repository files navigation

Matching

To run, fork then open "Course-Industry Matching.ipynb" in ipython notebook. All important functions are explained there.

Analysis

This repository analyzes the likelihood of matching between two independent sets of data (e.g. Course to Industry). The algorithm performs an initial Content-Based Filtering through features in text, with a dynamic capability of Collaborative Filtering through present user profiles.

Such likelihood is quantified using a matrix, where each entry describes the relative likelihood of matching. This is ideal for it is scalable with new data, and it is compatible with multiple criteria likelihood (e.g. Course to Industry to Jobs). One just needs to multiply the respective matrices to acquire a new likelihood relationship.

Algorithm

The steps of the algorithm is as follows:

  1. Data Mining / Data Gathering

  2. Data Cleaning
    • text normalization
    • prefix removal
    • abbreviation mapping
    • internal respelling

  3. Clustering
    • Uses WORD STEMMING and WORD FREQUENCY

  4. Creation of Likelihood Matrix
    • Content-based Filtering
    • Uses cosine similarity of features
    • Tfdif vectorization of text

  5. Dynamic Update of Likelihood
    • Collaborative Filtering
    • Uses cosine similarity as well
    • Increases likelihood for each new user info (example below)
      • user course: MARKETING
      • user work industry: FINANCE INDUSTRY
      • result: likelihood match of MARKETING and FINANCE increases
    • Uses cross product of all possible keyword matches

  6. Repeat of previous step (5)

Python Requirements (through pip)

1) pyenchant
	- with AbiWord Enchant 
2) stemming
3) numpy
4) scipy
5) sklearn
6) pandas

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages