Export Telegram messages, using telegram-cli. Patched version recommended.
This version (v3) is compatible with vysheng/tg/master
AND vysheng/tg/test
branches.
Note: The database format of this version (v3) is not compatible with the old ones.
To convert old databases (v1 or v2), run python3 dbconvert.py [old.db [new.db]]
$ python3 export.py -h
usage: export.py [-h] [-o OUTPUT] [-d DB] [-f] [-p PEER] [-B] [-t TIMEOUT]
[-l] [-L] [-e TGBIN] [-v]
Export Telegram messages.
optional arguments:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
output path
-d DB, --db DB database path
-f, --force force download all messages
-p PEER, --peer PEER only download messages for this peer (format:
channel#id1001234567, or use partial name/title as
shown in tgcli)
-B, --batch-only fetch messages in batch only, don't try to get more
missing messages
-t TIMEOUT, --timeout TIMEOUT
tg-cli command timeout
-l, --logging logging mode (keep running)
-L, --keep-logging first export, then keep logging
-e TGBIN, --tgbin TGBIN
telegram-cli binary path
-v, --verbose print debug messages
Lots of workaround about the unreliability of tg-cli is included (in this script and tgcli.py
), so the script itself may be unreliable as well.
Common problems with tg-cli are:
- Dies arbitrarily.
- No response in the socket interface.
- Slow response in the socket interface.
- Half response in the socket interface, while the another half appears after the timeout.
- Returns an empty array when actually there are remaining messages.
Note: When it's trying to get the remaining messages, the telegram-cli will crash like crazy. That's due to non-existent messages. For a quick fix, use this fork of tg-cli.
Which is called NO WARRANTY™.
This script can process database written by export.py
or tg-chatdig, and write out a human-readable format (txt, html, etc.) according to a jinja2 template.
usage: logfmt.py [-h] [-o OUTPUT] [-d DB] [-b BOTDB] [-D BOTDB_DEST] [-u]
[-t TEMPLATE] [-P PEER_PRINT] [-l LIMIT] [-L HARDLIMIT]
[-c CACHEDIR] [-r URLPREFIX]
peer
Format exported database file into human-readable format.
positional arguments:
peer export certain peer id or tg-cli-style peer print name
optional arguments:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
output path
-d DB, --db DB tg-export database path
-b BOTDB, --botdb BOTDB
tg-chatdig bot database path
-D BOTDB_DEST, --botdb-dest BOTDB_DEST
tg-chatdig bot logged chat id or tg-cli-style peer
name
-u, --botdb-user use user information in tg-chatdig database first
-t TEMPLATE, --template TEMPLATE
export template, can be 'txt'(default), 'html',
'json', or template file name
-P PEER_PRINT, --peer-print PEER_PRINT
set print name for the peer
-l LIMIT, --limit LIMIT
limit the number of fetched messages and set the
offset
-L HARDLIMIT, --hardlimit HARDLIMIT
set a hard limit of the number of messages, must be
used with -l
-c CACHEDIR, --cachedir CACHEDIR
the path of media files
-r URLPREFIX, --urlprefix URLPREFIX
the url prefix of media files
Simple wrapper for telegram-cli interface.
Example:
tgcli = TelegramCliInterface('../tg/bin/telegram-cli')
dialogs = tgcli.cmd_dialog_list()
run()
starts the subprocess, needed when object created withrun=False
.send_command(cmd, timeout=180, resync=True)
sends a command to tg-cli. useresync
for consuming text since last timeout.cmd_*(*args, **kwargs)
is the convenience method to send a command and get response.args
are for the command,kwargs
are arguments forTelegramCliInterface.send_command
.on_info(text)
(callback) is called when a line of text is printed on stdout.on_json(obj)
(callback) is called with the interpreted object when a line of json is printed on stdout.on_text(text)
(callback) is called when a line of anything is printed on stdout.on_start()
(callback) is called after telegram-cli starts.on_exit()
(callback) is called after telegram-cli dies.close()
properly ends the subprocess.
do_nothing()
function does nothing. (for callbacks)
TelegramCliExited
exception is raised if telegram-cli dies when reading an answer.
Now it's LGPLv3+.