It compares the number of tokens in a file using different encodings. Usage python compare_encodings.py [data_dir] [exclude_extensions] Example python compare_encodings.py data/ txt