TWINT Flask-Celery Server
Optimized tweets scraping
See also Twint Kibana
- Python3, Twint, Flask, Celery
- Elasticsearch(v7)
- RabitMQ
- (optional) Flower
- Run Celery workers:
$ celery worker --app=worker.celery --hostname=worker.fetching@%h --queues=fetching --loglevel=info
- (Optional) task for reporting progress if it is implemented
$ celery worker --app=worker.celery --hostname=worker.saving@%h --queues=saving --loglevel=info
- Run Flask server:
$ python3 app.py
- (Optional) Monitor Celery with Flower:
$ celery -A app.celery flower --broker='pyamqp://guest@localhost//'
-
Create ES index with index-tweets.json
-
Start tweets fetching
- arguments are mapped to twint config
- I mainly use it with elasticsearch so I did not test with other arguments
- Since and Until and Search/User are required
POST http://localhost:5000/fetch
{
"Since": "2019-2-1",
"Until": "2019-3-1",
"Search": "<keyword>",
// or
"User": "<username>"
"Elasticsearch": "localhost:9200",
"Index_tweets": "<es index name>"
}