Skip to content

v1.0.2 - Alternative Language Detection Methods

Compare
Choose a tag to compare
@nreimers nreimers released this 29 Jan 10:09
· 46 commits to main since this release

fastText is used for automatic language detection, as it provides the highest speed and best accuracy.

However, it can be complicated to install it on Windows as it requires a C/C++ compiler.

This release adds two alternative language identifiers:

  • [langid][(https://github.com/saffsd/langid.py) - Can be installed via pip install langid
  • langdetect - Can be installed via pip install langdetect

If fastText is not available, langid / langdetect will be used as alternative language detection methods.

For installation on Windows, you can run the following commands:

pip install --no-deps easynmt
pip install tqdm transformers numpy nltk sentencepiece langid 

Further, you have to install pytorch as described here:
https://pytorch.org/get-started/locally/

If you want to install fastText on Windows, I can recommend this link:
https://anaconda.org/conda-forge/fasttext