GitHub

how to use

make sure you have scrapy installed these:

pip install scrapyd
pip install beautifulsoup4
pip install lxml
pip install chardet
pip install cssselect

play around with scrapy to understand how it does css or xpath selection

It's the same as beautiful soup or jsoup, or lxml, or other html parsers

cd ehealth;
scrapy shell http://ehealthforum.com/health/epilepsy.html
response.xpath("//h1")
response.css(".fp_h2")
ctrl-Z to exit the shell

run a crawler

scrapy crawl epilepsy_ehealthforums_spider

you see that items are being logged

explaination of the hepc_hepmag_spider

for this spider, there are 2 rules

rule 1, use css selectors to get the links of all the topics
rule 2, use css selectors to get the links of the paginations, and ask it to follow, but exclude following into links for user profiles

####Then there is a parse method in the parse method, the spider has landed in each individual topic's page, and is now picking out each of the posts, and putting them into the item object which is defined in items.py Those are examples of good to have fields that the Item should have.

For more information

see http://doc.scrapy.org/en/latest/topics/api.html and https://github.com/Axiologue/ShoeScraper

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
forum		forum
.gitignore		.gitignore
FBSearcher.py		FBSearcher.py
README.md		README.md
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

forum

forum

.gitignore

.gitignore

FBSearcher.py

FBSearcher.py

README.md

README.md

scrapy.cfg

scrapy.cfg

Repository files navigation

how to use

make sure you have scrapy installed these:

play around with scrapy to understand how it does css or xpath selection

run a crawler

explaination of the hepc_hepmag_spider

For more information

About

Releases

Packages

Languages

WangMing212/ehealth_scraper

Folders and files

Latest commit

History

Repository files navigation

how to use

make sure you have scrapy installed these:

play around with scrapy to understand how it does css or xpath selection

run a crawler

explaination of the hepc_hepmag_spider

For more information

About

Resources

Stars

Watchers

Forks

Languages