It is a quick and dirty implementation of a LinkedIn profile crawler written in Python, using Pattern as HTML parser and MongoDB as local storage. Data collected includes a person's education profile, work experience and skills set.
- [Pattern] (http://www.clips.ua.ac.be/pattern)
- [Requests] (http://docs.python-requests.org/en/latest/)
- [MongoDB] (https://www.mongodb.org/)
-
Run a mongoDB server (http://docs.mongodb.org/manual/tutorial/manage-mongodb-processes/)
-
Set the region where you want to crawl in settings.py, e.g. Hong Kong, Taiwan, etc.
-
Get a few seed public profiles from LinkedIn and add them to settings.py, for example:
# settings.py CRAWL_REGIONS = ['Hong Kong'] SEED_PROFILES = ['https://www.linkedin.com/in/simonsiuhk']
-
Run LinkedInCrawler.py