Free proxy server based on Tornado and Scrapy.
Build your own proxy pool!
Features:
- continuesly crawling and providing free proxy
- with friendly and easy-to-use HTTP api
- asynchronous and high-perfermance
- support high con-concurrent
- automatically check proxy in cycle and ditch unavailable ones
基于Tornado和Scrapy的免费代理服务器
- 持续爬取新代理
- 易用的HTTP api
- 异步,支持高并发
- 定时检测代理可用性,自动更新
中文文档正在写…… _(:ι」∠)_
This project has been tested on:
- Archlinux; Python-3.6.5
- Debian(wsl); Python-3.5.3
And it doesn't support Windows for now...
- Install base requirements:
python>=3.5
(I use Python-3.6.5)redis
- Clone this repo.
- Install python packages by:
pip install -r requirements.txt
- Read the config and modify it according to your need.
- Start the server:
python ./src/main.py
- Then use the APIs to get proxies.
typical response:
{
"code": 0,
"msg": "ok",
"data": {
...
}
}
- code: result of event (not http code), 0 for sucess
- msg: message for failed event
- data: detail for sucessful event
GET /api/proxy/
params | Must/ Optional |
detail | default |
---|---|---|---|
count | O | the number of proxies you need | 1 |
scheme | O | choices:HTTP HTTPS |
both* |
anonymity | O | choices:transparent anonymous |
both |
(TODO) sort_by_speed |
O | choices: 1: desending order 0: no order -1: ascending order |
0 |
- both: include all type, not grouped
example
- To acquire 10 proxies in HTTP scheme with anonymity:
The response:
GET /api/proxy/?num=10&scheme=HTTP&anonymity=anonymous
{ "code": 0, "msg": "ok", "data": { "count": 9, "items": [ { "port": 2000, "ip": "xxx.xxx.xx.xxx", "scheme": "HTTP", "url": "http://xxx.xxx.xxx.xx:xxxx", "anonymity": "transparent" }, ... ] } }
screenshot
Check server status. Include:
- Running spiders
- Stored proxies
GET /api/status/
No params.
screenshot
Path: {repo}/src/config/common.py
HTTP_PORT
decide which http port to run on (default: 12345)CONSOLE_OUTPUT
if set to 1, the server will print log to console other than file (default: 1)LOG
log config, including:level
dir
andfilename
, logging to file requiresCONSOLE_OUTPUT = 0
REDIS
redis database config, including:host
port
db
PROXY_STORE_NUM
the number of proxy you need (default 500)- After reached this number, the crawler stopped crawling new proxies.
- Set it depending on your need.
PROXY_STORE_CHECK_SEC
every proxy will be checked in period- It's for each single proxy, not the checker spider.
Growing……
Supporting:
I need your feedback to make it better.
Please create an issue for any problems or advice.
Known bugs:
- Many wierd
None
... thought relavant to insecure thread - Block while using Tornado-4.5.3
- Divide log module
- More detailed api
- Bring in Docker
- Web frontend via bootstrap
- Add user-agent pool