Scraping workshop

Where are the challenges ?

The challenges are here.

All data are from the Titanic disaster (it reminds you Kaggle ?)

How to complete the challenge ?

Step 1: Fill prerequisite

Install Python 2.7

Scrapy works only with Python 2.7.

Please install Python 2.7, and not Python 3.x!

Install dependencies

On Ubuntu 16:

sudo apt-get install python-dev python-pip libssl-dev libxml2-dev libxslt1-dev libffi-dev

On Windows:

Download and install Anaconda Distribution for Python 2.7.

On Mac OS X:

brew install python

Install Scrapy, Scrapoxy and ScrapingHub tools

sudo pip install scrapy scrapoxy shub

Step 2: Clone the repository

git clone https://github.com/fabienvauchelles/scraping-challenge-workshop.git
cd scraping-challenge-workshop

Step 3: Edit your scraper to complete the challenge

Scraper code is inside the file myscraper/spiders/myscraper.py.

Items are inside the file myscraper/items.py.

Step 4: Start the scraper

cd scraping-challenge-workshop
scrapy crawl myscraper -t jsonlines -o persons.json

Exports items are inside the file persons.json.

Licence

See the Licence.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
myscraper		myscraper
.gitignore		.gitignore
LICENCE.txt		LICENCE.txt
README.md		README.md
scrapy.cfg		scrapy.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

myscraper

myscraper

.gitignore

.gitignore

LICENCE.txt

LICENCE.txt

README.md

README.md

scrapy.cfg

scrapy.cfg

setup.py

setup.py

Repository files navigation

Scraping workshop

Where are the challenges ?

How to complete the challenge ?

Step 1: Fill prerequisite

Install Python 2.7

Install dependencies

Install Scrapy, Scrapoxy and ScrapingHub tools

Step 2: Clone the repository

Step 3: Edit your scraper to complete the challenge

Step 4: Start the scraper

Licence

About

Releases

Packages

Languages

License

johnxerri/scrapoxyChallenge

Folders and files

Latest commit

History

Repository files navigation

Scraping workshop

Where are the challenges ?

How to complete the challenge ?

Step 1: Fill prerequisite

Install Python 2.7

Install dependencies

Install Scrapy, Scrapoxy and ScrapingHub tools

Step 2: Clone the repository

Step 3: Edit your scraper to complete the challenge

Step 4: Start the scraper

Licence

About

Resources

License

Stars

Watchers

Forks

Languages