Skip to content
forked from tq010or/acl2013

Code and pre-trained model used in ACL 2013 demo paper

License

Notifications You must be signed in to change notification settings

silviojr/acl2013

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

===========
Geolocation Prediction for Public Twitter Users
===========

This package provides a city-level geolocation prediction for non-protected Twitter users.

The code and data is used in the following paper:

Bo Han, Paul Cook and Timothy Baldwin, (to appear) A Stacking-based Approach to Twitter User Geolocation Prediction, In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Demo Session, Sofia, Bulgaria.



Quick Start
=========
1. The system requires the following prerequisites:
* python 2.65 (python 2.7.3 is also tested)
* ujson (https://pypi.python.org/pypi/ujson) 
* twython library (git://github.com/ryanmcgrath/twython.git)
* liblinear (http://www.csie.ntu.edu.tw/~cjlin/cgi-bin/liblinear.cgi?+http://www.csie.ntu.edu.tw/~cjlin/liblinear+zip)
* Flask (for running new server) pip install flask
* Langid (for running new server) pip install langid


2. Set environment variable.
Suppose you are using Ubuntu Linux and you have extracted the package to your home folder, e.g., /home/YOUR_NAME/acl2013.
$ cd ~
Update the following "exports" in /home/YOUR_NAME/acl2013/rest/run.sh
export PYTHONPATH=/home/YOUR_NAME/acl2013:$PYTHONPATH
export geoloc=/home/YOUR_NAME/acl2013/geoloc


3. get pre-trained models and set your program credentials
Decompress pre-trained models as follows,
$ cd /home/YOUR_NAME/acl2013/geoloc/models
$ tar -zxvf world.models.tar.gz
Then save your credentials in four lines in /home/YOUR_NAME/acl2013/geoloc/data/credentials.txt according to the following order.
consumer_token
consumer_secret
access_token
access_secret


4.1 start new RESTful service
$ cd /home/YOUR_NAME/acl2013/rest/
$ ./run.sh
It is recommended to use "screen" command to simulate multiple terminals.

4.2 set up old original server
$ cd /home/YOUR_NAME/acl2013/geoloc/classifiers
$ python text_xmlrpc_server.py 
$ python loc_xmlrpc_server.py 
$ python tz_xmlrpc_server.py 
$ python geo_xmlrpc_server.py

Test the live service
This little demo geolocates authors
$ cd /home/YOUR_NAME/acl2013/api
$ python geoloc_cli.py

Test the website (currently only Google Chrome browser is supported).
$ cd /home/YOUR_NAME/acl2013/web
$ python web_app.py



Experiment Replication
=========
Because of Twitter API policy, we are not able to share the raw data, but provide the user ids we used for our experiments (see more in the paper).
You can crawl them for evaluation. Please note that because of the tweet and user account deletion in Twitter, some user tweets may be not available after the release.

The English user screen names are put in /home/acl2013/evaluation/data/live.unames, one screen name per user. Note this also include users with only a few geo-tagged tweets or with lower than 50% coherence. We exclude them in later evaluation, the process is transparent.
We only use LIVETEST (see more in the paper) for demonstration.

When crawling data using Twitter API, please save all the raw tweet dump in /home/YOUR_NAME/evaluation/live and name the tweet dump after the user name.
Suppose a user named "tuser", when you run following command,
$ less /home/YOUR_NAME/evaluation/live/tuser
the following format data will show
[{tweet dump_1}, {tweet dump_2}, ... {tweet dump_n}]
It is a list of tweet dump (in JSON dict format).

To run the evaluation, simply run
$ python /home/YOUR_NAME/evaluation/code/eval_livetest.py

If everything is correct, you should get the prediction results and summary (31629 lines in total) printed in the standard output.
The last five lines are like:
Acc: 0.405957500632
Acc161: 0.613995699469
AccCountry: 0.900613458133
Median: 39.9827729857
Mean: 937.655151249



Citations
=========
Please cite the following paper whenever appropriate:

Bo Han, Paul Cook and Timothy Baldwin, (to appear) A Stacking-based Approach to Twitter User Geolocation Prediction, In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Demo Session, Sofia, Bulgaria.



MISC
=========
We also offer some off-the-shelf demos, but the availability of these services are without warranty.

For web access: <http://hum.csse.unimelb.edu.au:9000/geo.html>
For Twitter bot access: @melbltfsd



TODOs:
=========
0. Lots of refactoring work.
1. Documentation.
2. Parallel decoding (using openMP for instance).


Contact
=========
Comments and suggestions are highly welcomed.

Bo Han, hanb@student.unimelb.edu.au

About

Code and pre-trained model used in ACL 2013 demo paper

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 71.3%
  • JavaScript 13.5%
  • CSS 10.1%
  • HTML 4.8%
  • Shell 0.3%