This project intend to analyze latent features (topics) in job descriptions. It answers questions such as how valuable a skill-set is, how much the value varies across companies and What topics or skills contribute to high salaries. Average salaries per topic has been used to determine the value of topics or skills. Jobs are ranked for each topic allowing to determine the relevance strength of skills for each topic.
The data has been collected from different sources and merged for matching job titles.
- Salaries [Salary, Date, State, City]: www.jobs-salary.com
- Job descriptions: indeed.com and simplyhired.com
The Dataset consist of 15724 jobs for 12 companies and 88334 salaries.
The generated matrixes from NMF rank jobs per latent feature and words per the same latent feature. This makes each latent feature identifiable by the words it ranks highly. The following are examples of some latent features:
The following image, illustrates the top 400 words and their sizes represent their word rank for the this feature:
The X-axes represent the strength/weight of this latent feature (Customer services) per job, while y-axis is the corresponding average salary. It shows that the more the job description is relevant to customer services the lower the average salaries.