IRRCrawler

Crawler used to collect car ads from irr.ru. IRR is one of the largest sites with free ads in the Russian segment of the network. The collected data are stored in the PostgreSQL database and used by the customer for his purposes. Besides, "fresh" ads are stored in an Excel-spreadsheet that is sent to a customer's employee by e-mail.

Change value of the attribute start_urls in irr_spider module in order to specify the brands of cars that interest you. Now spider collects only ads about selling of UAZ.

Technical details

The following tools were used: Python, Scrapy, Selenium, SQLAlchemy and xlwt. Scrapy is used to extract, process, and store the information from webserver's responses. Since the website requires support of JavaScript and Cookies for correct interaction, we provided it using Selenium, replacing the standard Scrapy logic of the sending of requests and receiving of responses. SQLAlchemy is used to interact with a database. The xlwt package is used to export the data in Excel. Crawler uses the Polipo HTTP-proxy on our remote host in order to hide the real IP-address of its server.

Usage

You can use make to perform basic operations.

To deploy configs:

make deploy_configs

To run unit tests:

make test

To run spider:

make start

License

The BSD 3-Clause License. See LICENSE file in project root directory.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
uaz		uaz
.gitignore.default		.gitignore.default
LICENSE		LICENSE
Makefile		Makefile
README.rst		README.rst
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uaz

uaz

.gitignore.default

.gitignore.default

LICENSE

LICENSE

Makefile

Makefile

README.rst

README.rst

requirements.txt

requirements.txt

Repository files navigation

IRRCrawler

Technical details

Usage

License

About

Releases

Packages

Languages

License

antonkonyshev/irrcrawler

Folders and files

Latest commit

History

Repository files navigation

IRRCrawler

Technical details

Usage

License

About

Resources

License

Stars

Watchers

Forks

Languages