Skip to content

A quick and dirty implementation of a LinkedIn profile crawler and parser using pattern and mongodb as storage

Notifications You must be signed in to change notification settings

Aracktus/LinkedIn-Profile-Crawler

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

LinkedIn-Profile-Crawler

Description

It is a quick and dirty implementation of a LinkedIn profile crawler written in Python, using Pattern as HTML parser and MongoDB as local storage. Data collected includes a person's education profile, work experience and skills set.

Requirements of 3rd party libraries

Usage

  1. Run a mongoDB server (http://docs.mongodb.org/manual/tutorial/manage-mongodb-processes/)

  2. Set the region where you want to crawl in settings.py, e.g. Hong Kong, Taiwan, etc.

  3. Get a few seed public profiles from LinkedIn and add them to settings.py, for example:

    # settings.py
    
    CRAWL_REGIONS = ['Hong Kong']
    SEED_PROFILES = ['https://www.linkedin.com/in/simonsiuhk']
  4. Run LinkedInCrawler.py

About

A quick and dirty implementation of a LinkedIn profile crawler and parser using pattern and mongodb as storage

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%