-
Notifications
You must be signed in to change notification settings - Fork 3
Code and pre-trained model used in ACL 2013 demo paper
License
tq010or/acl2013
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
=========== Geolocation Prediction for Public Twitter Users =========== This package provides a city-level geolocation prediction for non-protected Twitter users. The code and data is used in the following paper: Bo Han, Paul Cook and Timothy Baldwin, (to appear) A Stacking-based Approach to Twitter User Geolocation Prediction, In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Demo Session, Sofia, Bulgaria. Quick Start ========= 1. The system requires the following prerequisites: * python 2.65 (python 2.7.3 is also tested) * ujson (https://pypi.python.org/pypi/ujson) * twython library (git://github.com/ryanmcgrath/twython.git) * liblinear (http://www.csie.ntu.edu.tw/~cjlin/cgi-bin/liblinear.cgi?+http://www.csie.ntu.edu.tw/~cjlin/liblinear+zip) * Flask (for running new server) pip install flask * Langid (for running new server) pip install langid 2. Set environment variable. Suppose you are using Ubuntu Linux and you have extracted the package to your home folder, e.g., /home/YOUR_NAME/acl2013. $ cd ~ Update the following "exports" in /home/YOUR_NAME/acl2013/rest/run.sh export PYTHONPATH=/home/YOUR_NAME/acl2013:$PYTHONPATH export geoloc=/home/YOUR_NAME/acl2013/geoloc 3. get pre-trained models and set your program credentials Decompress pre-trained models as follows, $ cd /home/YOUR_NAME/acl2013/geoloc/models $ tar -zxvf world.models.tar.gz Then save your credentials in four lines in /home/YOUR_NAME/acl2013/geoloc/data/credentials.txt according to the following order. consumer_token consumer_secret access_token access_secret 4.1 start new RESTful service $ cd /home/YOUR_NAME/acl2013/rest/ $ ./run.sh It is recommended to use "screen" command to simulate multiple terminals. 4.2 set up old original server $ cd /home/YOUR_NAME/acl2013/geoloc/classifiers $ python text_xmlrpc_server.py $ python loc_xmlrpc_server.py $ python tz_xmlrpc_server.py $ python geo_xmlrpc_server.py Test the live service This little demo geolocates authors $ cd /home/YOUR_NAME/acl2013/api $ python geoloc_cli.py Test the website (currently only Google Chrome browser is supported). $ cd /home/YOUR_NAME/acl2013/web $ python web_app.py Experiment Replication ========= Because of Twitter API policy, we are not able to share the raw data, but provide the user ids we used for our experiments (see more in the paper). You can crawl them for evaluation. Please note that because of the tweet and user account deletion in Twitter, some user tweets may be not available after the release. The English user screen names are put in /home/acl2013/evaluation/data/live.unames, one screen name per user. Note this also include users with only a few geo-tagged tweets or with lower than 50% coherence. We exclude them in later evaluation, the process is transparent. We only use LIVETEST (see more in the paper) for demonstration. When crawling data using Twitter API, please save all the raw tweet dump in /home/YOUR_NAME/evaluation/live and name the tweet dump after the user name. Suppose a user named "tuser", when you run following command, $ less /home/YOUR_NAME/evaluation/live/tuser the following format data will show [{tweet dump_1}, {tweet dump_2}, ... {tweet dump_n}] It is a list of tweet dump (in JSON dict format). To run the evaluation, simply run $ python /home/YOUR_NAME/evaluation/code/eval_livetest.py If everything is correct, you should get the prediction results and summary (31629 lines in total) printed in the standard output. The last five lines are like: Acc: 0.405957500632 Acc161: 0.613995699469 AccCountry: 0.900613458133 Median: 39.9827729857 Mean: 937.655151249 Citations ========= Please cite the following paper whenever appropriate: Bo Han, Paul Cook and Timothy Baldwin, (to appear) A Stacking-based Approach to Twitter User Geolocation Prediction, In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), Demo Session, Sofia, Bulgaria. MISC ========= We also offer some off-the-shelf demos, but the availability of these services are without warranty. For web access: <http://hum.csse.unimelb.edu.au:9000/geo.html> For Twitter bot access: @melbltfsd TODOs: ========= 0. Lots of refactoring work. 1. Documentation. 2. Parallel decoding (using openMP for instance). Contact ========= Comments and suggestions are highly welcomed. Bo Han, [email protected]
About
Code and pre-trained model used in ACL 2013 demo paper
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published