Skip to content

A Python script to backup your tweets into an offline text file

Notifications You must be signed in to change notification settings

adlorenz/twitbak

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 

Repository files navigation

Twitbak

Twitbak is a simple Python script which allows creating a backup of your Twitter's public timeline into a local plain/text file.

Rationale

In case you didn't know, Twitter only allows last 3200 tweets to be recovered from any user's timeline using API, so if you've produced, say 3500 tweets in your timeline (inluding retweets and replies), the first 300 would be gone somewhere in distant Twitterverse forever.

I have accidentally found about this via Zach Holman's blog article the other day and thought writing a simple fetcher script to dump tweets into an offline text file would be nice Python excercise for a PHP'er keen to learn more Python.

Usage

Retrieve tweets from username's public timeline (excluding replies) into tweets.txt file in current dir where tweets are stored in reverse chronological order:

$ twitbak.py username

Run in auto-mode - automatically determine most recent tweet ID and only retrieve tweets from the timeline which are newer than that. Any tweets retrieved in auto-mode are inserted at the beginning of the output file:

$ twitbak.py -a username

To include replies, ie. tweets starting with @ sign (they are ignored by default by twitbak):

$ twitbak.py -r username

To specify different output file:

$ twitbak.py -o /path/to/file.txt username

API limitation

Twitter throttles clients using their API to 150 requests per hour. Since single request to user_timeline webservice returns maximum of 20 tweets at once, hourly limit gets exhausted after retrieving maximum of 3000 (150*20) tweets and further requests would need to wait additional hour until limit is reset.

To overcome that limitation and continue retrieving remaining tweets in non-hacky fashion, you can specify particular page number from which requests should be resumed:

$ twitbak.py -p 121 username

Above will resume fetching tweets from 121st page of the timeline.

Output file format

Tweets fetched by twitbak are stored in tab-separated values file, one line per tweet, which looks like this:

tweet body[TAB]tweet date[TAB]tweet id

The first two columns are no-brainer, while tweet ID is kept for reference and to make twitbak work nicely in auto-mode.

WTFs

Q: Hey, I've posted 500 tweets in my timeline but tweets.txt file only has 200 lines. WTF?!

A: This is most likely because twitbak ignores replies by default. It would be like recording only one side of telephone conversation and make very little sense. You can still include replies by using -r option.

Author

Hi, my name is Dawid Lorenz, I am a web developer with strong PHP background and was recently affected with Python/Django love. I treat twitbak as a way to expand my Python experience and create something genuienly useful at the same time.

About

A Python script to backup your tweets into an offline text file

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages