LinkedinProfileParser

DESCRIPTION This simple parser aimed to parse data from linkedin public profiles. Simple REST API is written using python 2.7.2, bottle 0.10.9 and "swiss army knife " scrapy 0.14.1.

API SPECIFICATION

Parsing url for competences and education localhost:8080/doparse?address="public profile url"

Output is in json with the format :

{ "educations" : [{"school": "XXX", "year_last": "YYY", "year_first": "ZZZ"}, ...], "tags" : ["MYSQL Database design","PYTHON",...], "experiences": [{"title": "XXX", "company":"YYY", "year_last":"ZZZ", "year_first":"XXX", "description":"YYY"} ...] "html": "XXX" }

Sample request

localhost:8080/doparse?address=http://fr.linkedin.com/in/vasylvaskul/

Sample output

{"educations": [{"school": "Science Po, Coll\u00e8ge des Ing\u00e9nieurs, \u00c9cole des Mines de Paris", "year_last": "2011", "year_first": "2010"}, {"school": "Kyiv National Taras Shevchenko University", "year_last": "2007", "year_first": "2001"}, {"school": "Drohobych Lyceum at Drohobych State 'Ivan Franko' University", "year_last": "2001", "year_first": "1999"}], "tags": []} In case of parsing problems error is returned: ex. {"error": {"message": "HTTP Response 404", "code":X}}

where code X can be one of the following :

1 - network problem 2 - page is not found (404 ) 3 - bad format

CACHING

Currently the system is stateless, every new request re-parse the the page.

TODO

Parse experiences Stock results to db

OPEN ISSUES

Use LInked API to parse provider's profile but using access_token of the users ? Cache or not cache the requests.

HOWTO RUN

python main.py to start server

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
analysis		analysis
company_culture_fit_files		company_culture_fit_files
employee_data		employee_data
linkedin		linkedin
.gitignore		.gitignore
README.md		README.md
all_company.ipynb		all_company.ipynb
company_culture_fit.ipynb		company_culture_fit.ipynb
company_culture_fit.md		company_culture_fit.md
company_culture_fit_header.html		company_culture_fit_header.html
people.txt		people.txt
people_1.txt		people_1.txt
people_3.txt		people_3.txt
people_all.txt		people_all.txt
people_all_norand.txt		people_all_norand.txt
people_es.txt		people_es.txt
people_es_rand.txt		people_es_rand.txt
people_es_sm_rand.txt		people_es_sm_rand.txt
people_no_cc.txt		people_no_cc.txt
people_short.txt		people_short.txt
people_sm_norand.txt		people_sm_norand.txt
people_sm_rand.txt		people_sm_rand.txt
simply_measured.ipynb		simply_measured.ipynb

dleen/employee_comparison

Folders and files

Latest commit

History

Repository files navigation

LinkedinProfileParser

API SPECIFICATION

Sample request

Sample output

CACHING

TODO

OPEN ISSUES

HOWTO RUN

About

Resources

Stars

Watchers

Forks

Languages