Generate graphs and statistics from your exported Telegram messages.
First you need to export your Telegram data to a result.json
file. You can do this in the settings of the Telegram desktop client.
./telegram-statistics.py -i result.json -n "name"
Open the file result_2019-05-30.json
and parse the chat history with Name Surname
starting from 2018-01-01
up to now and generate the substring plot for the emojis "😘💗💙💓🧡😘💕😚😍🥰"
./telegram-statistics.py -i ../result_2019-05-30.json -n "Name Surname" -d 2018-01-01 -w "😘;💗;💙;💓;🧡;😘;💕;😚;😍;🥰"
There is a convert-whatsapp.py
to import a whatsapp exported Whatsapp Chat with Name.txt
into a Telegram style json format.
To find the correct [Name Surname]
take the name in the first line in the Whatsapp export txt.
However, the Whatsapp export is not as detailed as the Telegram export, so many numbers cannot be calculated.
./convert-whatsapp.py -i "Whatsapp Chat with Name.txt"
./telegram-statistics -i whatsapp-result.json -n "Name Surname"
Where "name"
is the name displayed in Telegram (usually the surname).
The script generates multiple files.
emojis.txt
contains unicode encoded emojis and their countraw_metrics.json
raw numerical data (contains all text of both persons / large file)
HTML Files (Plots):
plot_hours.html
bokeh plot of message frequency over the hours of one dayplot_month.html
bokeh plot of number of messages sent per monthplot_month_characters.html
bokeh plot of characters sent per monthplot_weekdays.html
bokeh plot of message frequency over one weekplot_month_calls.html
bokeh plot of number of calls per monthplot_month_call_time.html
bokeh plot of total seconds on call per monthplot_month_photos.html
bokeh plot of number of photos sent per monthplot_month_replytime.html
bokeh plot of average monthly replytime (Beta)plot_month_word_occurrence.html
bokeh plot of combined substring occurences over time
Raw Files (one for each person):
raw_months_person_Person A.csv
csv vaues of month dataraw_weekdays_person_Person A.csv
csv vaues of weekday dataraw_months_chars_person_Person A.csv
csv vaues of monthly character count dataraw_monthly_pictures_person_Person A.csv
csv vaues of monthly picture count dataraw_monthly_calls_person_Person A.csv
csv vaues of monthly number of callsraw_monthly_call_duration_person_Person A.csv
csv values of monthly call durationraw_monthly_time_to_reply_person_Person A.csv
csv vaues of monthly reply time
- total number of messages
- total number of words
- total number of characters
- count occurrence of each word
- number of unique words
- total number of messages
- total number of words
- total number of characters
- average number of words per message
- average number of characters per message
- count occurrence of each word
- count occurrence of each emoji
- number of messages formated with markdown
- number of messages of type [animation, audio_file, sticker, video_message, voice_message]
- number of photos
- number of unique words
python 3
bokeh
numpy
pandas
I was inspired to do this project by a post on reddit.com/r/LongDistance
I would love to hear if you have made some statistics yourself. Feel free to message me on reddit
If you want to implement new metrics feel free to fork and send a pull request. Here are some things that I think could be improved or added:
- normalize weekly / hourly data to "average number" per day/hour instead of "total number"
- number of edited messages
- csv separator is currently a semicolon
;
- other country specific errors (eg. with dates)
MIT License
Copyright (c) 2018 Simon Burkhardt