Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: UKPLab/EasyNMT
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v1.0.2
Choose a base ref
...
head repository: UKPLab/EasyNMT
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Choose a head ref

Commits on Jan 29, 2021

  1. Copy the full SHA
    98fdbcd View commit details
  2. Copy the full SHA
    6e30bba View commit details

Commits on Jan 31, 2021

  1. docker image

    nreimers committed Jan 31, 2021
    Copy the full SHA
    c091d17 View commit details
  2. Copy the full SHA
    2c626a7 View commit details

Commits on Feb 1, 2021

  1. Move docker files

    nreimers committed Feb 1, 2021
    Copy the full SHA
    d32b9be View commit details
  2. Merge pull request #4 from hugoabonizio/main

    Add sampling arguments
    nreimers authored Feb 1, 2021
    Copy the full SHA
    37ec20f View commit details
  3. commit

    nreimers committed Feb 1, 2021
    Copy the full SHA
    3c9f959 View commit details
  4. Copy the full SHA
    fe52d2d View commit details
  5. Copy the full SHA
    33f289b View commit details
  6. Version 1.1.0

    nreimers committed Feb 1, 2021
    Copy the full SHA
    50d859b View commit details

Commits on Feb 5, 2021

  1. Copy the full SHA
    dea2a35 View commit details

Commits on Feb 7, 2021

  1. add max len parameter

    nreimers committed Feb 7, 2021
    Copy the full SHA
    92d6742 View commit details
  2. update docker

    nreimers committed Feb 7, 2021
    Copy the full SHA
    f9ad534 View commit details
  3. update docker

    nreimers committed Feb 7, 2021
    Copy the full SHA
    db9a17d View commit details

Commits on Mar 10, 2021

  1. Update OpusMT.py

    Fix #19
    nreimers authored Mar 10, 2021
    Copy the full SHA
    27bef67 View commit details

Commits on Mar 15, 2021

  1. update docker

    nreimers committed Mar 15, 2021
    Copy the full SHA
    2825036 View commit details

Commits on Mar 16, 2021

  1. update docker

    nreimers committed Mar 16, 2021
    Copy the full SHA
    674213b View commit details
  2. update docker

    nreimers committed Mar 16, 2021
    Copy the full SHA
    f55578f View commit details
  3. update docker

    nreimers committed Mar 16, 2021
    Copy the full SHA
    35275e6 View commit details
  4. add colab rest API example

    nreimers committed Mar 16, 2021
    Copy the full SHA
    36e257e View commit details

Commits on Mar 17, 2021

  1. Update README.md

    nreimers authored Mar 17, 2021
    Copy the full SHA
    1eaa248 View commit details

Commits on Mar 31, 2021

  1. Copy the full SHA
    582e80e View commit details

Commits on Apr 9, 2021

  1. Update README.md

    nreimers authored Apr 9, 2021
    Copy the full SHA
    f19e928 View commit details
  2. Copy the full SHA
    7ab3731 View commit details
  3. Copy the full SHA
    94428e5 View commit details

Commits on Apr 16, 2021

  1. Update EasyNMT.py

    Bugfix #24
    nreimers authored Apr 16, 2021
    Copy the full SHA
    46cf8e4 View commit details

Commits on Apr 21, 2021

  1. update lang pairs

    nreimers committed Apr 21, 2021
    Copy the full SHA
    63fc82a View commit details
  2. Copy the full SHA
    61fcf71 View commit details

Commits on Apr 26, 2021

  1. Copy the full SHA
    9b69703 View commit details
  2. Update

    nreimers committed Apr 26, 2021
    Copy the full SHA
    f3af26a View commit details
  3. new cachefolder path

    nreimers committed Apr 26, 2021
    Copy the full SHA
    1b562d0 View commit details
  4. update docker files

    nreimers committed Apr 26, 2021
    Copy the full SHA
    aaf0acd View commit details
  5. Copy the full SHA
    ce8061f View commit details
  6. rename folder names

    nreimers committed Apr 26, 2021
    Copy the full SHA
    33bd680 View commit details
  7. update readme

    nreimers committed Apr 26, 2021
    Copy the full SHA
    1ac9876 View commit details
  8. Copy the full SHA
    2fa1564 View commit details
  9. Copy the full SHA
    acad45c View commit details

Commits on May 2, 2021

  1. Update README.md

    Update performance numbers with new version 2 (models run via huggingface transformers vs. fairseq)
    nreimers authored May 2, 2021
    Copy the full SHA
    2c2533d View commit details
  2. Update README.md

    nreimers authored May 2, 2021
    Copy the full SHA
    9c69105 View commit details

Commits on Aug 5, 2021

  1. Copy the full SHA
    5ea48f5 View commit details

Commits on May 27, 2022

  1. Add protobuf as dependency

    nreimers committed May 27, 2022
    Copy the full SHA
    c4ed343 View commit details
  2. Update docker

    nreimers committed May 27, 2022
    Copy the full SHA
    3037171 View commit details

Commits on Aug 15, 2022

  1. AutoModel: added the possibility to set the max_length param of the t…

    …okenizer
    
    It is important to have the possibility to tune this parameter to avoid
    OOM
    g.racic committed Aug 15, 2022
    Copy the full SHA
    6cd3b64 View commit details
  2. Merge pull request #75 from nateagr/main

    AutoModel: added the possibility to set the max_length param of the tokenizer
    nreimers authored Aug 15, 2022
    Copy the full SHA
    7c11ae8 View commit details

Commits on Dec 19, 2023

  1. Fixed typo

    Changed mBERT to mBART in one of the headers
    FilipRank authored Dec 19, 2023
    Copy the full SHA
    78db9da View commit details

Commits on Dec 21, 2023

  1. Merge pull request #98 from Roosterington/patch-1

    Fixed typo
    nreimers authored Dec 21, 2023
    Copy the full SHA
    d9db97f View commit details
Showing with 1,386 additions and 1,823 deletions.
  1. 0 LICENSE
  2. 0 NOTICE.txt
  3. +33 −12 README.md
  4. +83 −0 docker/README.md
  5. +50 −0 docker/api/cpu.dockerfile
  6. +49 −0 docker/api/cuda10.1.dockerfile
  7. +48 −0 docker/api/cuda11.0.dockerfile
  8. +50 −0 docker/api/cuda11.1.dockerfile
  9. +50 −0 docker/api/cuda11.3.dockerfile
  10. +50 −0 docker/api/gunicorn_conf_backend.py
  11. +50 −0 docker/api/gunicorn_conf_frontend.py
  12. +2 −0 docker/api/requirements.txt
  13. +162 −0 docker/api/src/main.py
  14. +35 −0 docker/api/start.sh
  15. +28 −0 docker/api/start_backend.sh
  16. +19 −0 docker/api/start_frontend.sh
  17. +19 −0 docker/build-docker-hub.sh
  18. +29 −0 docker/examples/php_query_api.php
  19. +58 −0 docker/examples/python_query_api.py
  20. +350 −0 docker/examples/vue_js_frontend.html
  21. +91 −19 easynmt/EasyNMT.py
  22. +2 −2 easynmt/__init__.py
  23. +67 −0 easynmt/models/AutoModel.py
  24. +0 −236 easynmt/models/Fairseq.py
  25. +12 −15 easynmt/models/OpusMT.py
  26. +2 −0 easynmt/models/__init__.py
  27. +13 −1 easynmt/util.py
  28. 0 examples/test_all_models.py
  29. +5 −2 examples/test_mutli_process_translation.py
  30. 0 examples/test_translation_speed.py
  31. 0 examples/translation_document.py
  32. 0 examples/translation_multi_gpu.py
  33. 0 examples/translation_sentences.py
  34. 0 examples/translation_streaming.py
  35. +0 −525 models/m2m_100_1.2B/config.yaml
  36. +0 −10 models/m2m_100_1.2B/easynmt.json
  37. +8 −0 models/m2m_100_1.2b/easynmt.json
  38. +0 −525 models/m2m_100_418M/config.yaml
  39. +0 −10 models/m2m_100_418M/easynmt.json
  40. +8 −0 models/m2m_100_418m/easynmt.json
  41. +0 −449 models/mbart50_m2m/config.yaml
  42. +8 −9 models/mbart50_m2m/easynmt.json
  43. +1 −1 models/opus-mt/easynmt.json
  44. 0 setup.cfg
  45. +4 −7 setup.py
Empty file modified LICENSE
100755 → 100644
Empty file.
Empty file modified NOTICE.txt
100755 → 100644
Empty file.
45 changes: 33 additions & 12 deletions README.md
100755 → 100644
Original file line number Diff line number Diff line change
@@ -15,10 +15,28 @@ At the moment, we provide the following models:


**Examples:**
- [EasyNMT Google Colab Example](https://colab.research.google.com/drive/1X47vgSiOphpxS5w_LPtjQgJmiSTNfRNC?usp=sharing)
- [EasyNMT Opus-MT Online Demo](http://easynmt.net/demo)
- [EasyNMT Google Colab Example](https://colab.research.google.com/drive/1X47vgSiOphpxS5w_LPtjQgJmiSTNfRNC?usp=sharing) - Step-by-step example how to use EasyNMT with Python.
- [EasyNMT Opus-MT Online Demo](http://easynmt.net/demo) - Demo to test the translation quality of the Opus-MT model.
- [EasyNMT Google Colab REST API Hosting](https://colab.research.google.com/drive/1kAh_Vt1ipA5-BuoaPX39rCIHFrhpcRpW?usp=sharing) - Example how to host a translation REST API on Google Colab and using the free GPU.

## Installation

## Docker & REST-API

We provide ready-to-use Docker images, that wrap EasyNMT in a REST API:
```
docker run -p 24080:80 easynmt/api:2.0-cpu
```

Calling the REST API:
```
http://localhost:24080/translate?target_lang=en&text=Hallo%20Welt
```

See [docker/](docker/) for more information on the different Docker images and the REST API endpoints.

Also check our [EasyNMT Google Colab REST API Hosting](https://colab.research.google.com/drive/1kAh_Vt1ipA5-BuoaPX39rCIHFrhpcRpW?usp=sharing) example, on how to use Google Colab and the free GPU to host a translation API.

## Installation for Python
You can install the package via:

```
@@ -80,16 +98,19 @@ print(model.translate(sentences, target_lang='en'))
# Available Models
The following models are currently available. They provide translations between 150+ languages.
| Model | Reference | #Languages | Size | Speed GPU (Sentences/Sec on V100) | Speed CPU (Sentences/Sec) | Comment |
| --- | --- | :---: | :---: | :---: | :---: | --- |
| opus-mt | [Helsinki-NLP](https://github.com/Helsinki-NLP/Opus-MT) | 186 | 300 MB | 53 | 6 | Inidivudal models (~300 MB) per translation direction
| mbart50_m2m | [Facebook Research](https://github.com/pytorch/fairseq/tree/master/examples/multilingual) | 52 | 1.2 GB | 35 | 0.9|
| m2m_100_418M | [Facebook Research](https://github.com/pytorch/fairseq/tree/master/examples/m2m_100) | 100 | 0.9 GB | 39 | 1.1 |
| m2m_100_1.2B | [Facebook Research](https://github.com/pytorch/fairseq/tree/master/examples/m2m_100) | 100 | 2.4 GB | 23 |0.5 |
| opus-mt | [Helsinki-NLP](https://github.com/Helsinki-NLP/Opus-MT) | 186 | 300 MB | 50 | 6 | Inidivudal models (~300 MB) per translation direction
| mbart50_m2m | [Facebook Research](https://github.com/pytorch/fairseq/tree/master/examples/multilingual) | 52 | 2.3 GB | 25 | - |
| mbart50_m2en | [Facebook Research](https://github.com/pytorch/fairseq/tree/master/examples/multilingual) | 52 | 2.3 GB | 25 | - | Can only translate from the other languages to English.
| mbart50_en2m | [Facebook Research](https://github.com/pytorch/fairseq/tree/master/examples/multilingual) | 52 | 2.3 GB | 25 | - | Can only translate from English to the other languages.
| m2m_100_418M | [Facebook Research](https://github.com/pytorch/fairseq/tree/master/examples/m2m_100) | 100 | 1.8 GB | 22 | - |
| m2m_100_1.2B | [Facebook Research](https://github.com/pytorch/fairseq/tree/master/examples/m2m_100) | 100 | 5.0 GB | 13 | - |
## Translation Quality
@@ -110,9 +131,9 @@ model = EasyNMT('opus-mt', max_loaded_models=10)
The system will automatically detect the suitable Opus-MT model and load it. With the optional parameter `max_loaded_models` you can specify the maximal number of models that are simoultanously loaded. If you then translate with an unseen language direction, the oldest model is unloaded and the new model is loaded.
## mBERT_50
## mBART_50
We provide a wrapper for the [mBART50](https://arxiv.org/abs/2008.00401) model from Facebook, that is able to translate between any pair of 50+ languages.
We provide a wrapper for the [mBART50](https://arxiv.org/abs/2008.00401) model from Facebook, that is able to translate between any pair of 50+ languages. There are also models available to translate from English to these languages or vice versa.
@@ -135,8 +156,8 @@ We provide a wrapper for the [M2M 100](https://arxiv.org/abs/2010.11125) model f
As the moment, we provide wrapper for two M2M 100 models:
- **m2m_100_418M**: M2M model with 418 million parameters (0.9 GB)
- **m2m_100_1.2B**: M2M model with 1.2 billion parameters (2.4 GB)
- **m2m_100_418M**: M2M model with 418 million parameters (1.8 GB)
- **m2m_100_1.2B**: M2M model with 1.2 billion parameters (5.0 GB)
**Usage:**
```python
@@ -151,7 +172,7 @@ As soon as you call `EasyNMT('m2m_100_418M')` / `EasyNMT('m2m_100_1.2B')`, the r
## Author
Contact person: [Nils Reimers](https://www.nils-reimers.de); [reimers@ukp.informatik.tu-darmstadt.de](mailto:reimers@ukp.informatik.tu-darmstadt.de)
Contact person: [Nils Reimers](https://www.nils-reimers.de); [info@nils-reimers.de](mailto:info@nils-reimers.de)
https://www.ukp.tu-darmstadt.de/
83 changes: 83 additions & 0 deletions docker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Docker

We provide a [Docker](https://www.docker.com/) based REST-API for EasyNMT: Send a query with your source text, and the API returns the translated text.

## Setup

To start the EasyNMT REST API on port `24080`run the following docker command:
```
docker run -p 24080:80 easynmt/api:2.0-cpu
```

This uses the CPU image. If you have **GPU (CUDA)**, there are various GPU images available. Have a look at our [Docker Hub Page](https://hub.docker.com/r/easynmt/api/tags?page=1&ordering=last_updated).


## Usage

After you started the Docker image, you can visit: [http://localhost:24080/translate?target_lang=en&text=Hallo%20Welt](http://localhost:24080/translate?target_lang=en&text=Hallo%20Welt)

This should yield the following JSON:
```
{
"target_lang": "en",
"source_lang": null,
"detected_langs": [
"de"
],
"translated": [
"Hello world"
],
"translation_time": 0.163145542144775
}
```
If you have started it with a different port, replace `24080` with the port you chose.

Note, for the first translation, the respective models are downloaded. This might take some time. Consecutive calls will be faster.

## Programmatic Usage
- **Python:** [python_query_api.py](examples/python_query_api.py) - Sending requests with Python to the EasyNMT Docker API.
- **Vue.js:** [vue_js_frontend.html](examples/vue_js_frontend.html) Vue.js Code for our [demo](http://easynmt.net/demo/).

## Documentation

To get an overview of all REST API endpoints, with all possible parameters and their description, you open the following url: [http://localhost:24080/docs](http://localhost:24080/docs)

### Endpoints
The following endpoints with the GET method are defined (i.e. you can call them like `http://localhost:24080/name?param1=val1&param2=val2`)

```
/translate
Translates the text to the given target language.
:param text: Text that should be translated
:param target_lang: Target language
:param source_lang: Language of text. Optional, if empty: Automatic language detection
:param beam_size: Beam size. Optional
:param perform_sentence_splitting: Split longer documents into individual sentences for translation. Optional
:return: Returns a json with the translated text
/language_detection
Detects the language for the provided text
:param text: A single text for which we want to know the language
:return: The detected language
/get_languages
Returns the languages the model supports
:param source_lang: Optional. Only return languages with this language as source
:param target_lang: Optional. Only return languages with this language as target
:return:
```

You can call the `/translate` and `/language_detection` also with a POST request, giving you the option to pass a list of multiple texts. Then all texts are translated and returned at once.

### Environment Variables
You can control the Docker image using various environment variables:
- *MAX_WORKERS_BACKEND*: Number of worker processes for the translation. Default: 1
- *MAX_WORKERS_FRONTEND*: Number of worker processes for language detection & model info. Default: 2
- *EASYNMT_MODEL*: Which EasyNMT Model to load. Default: opus-mt
- *EASYNMT_MODEL_ARGS*: Json encoded string with parameters when loading EasyNMT: Default: {}
- *EASYNMT_MAX_TEXT_LEN*: Maximal text length for translation. Default: Not set
- *EASYNMT_MAX_BEAM_SIZE*: Maximal beam size for translation. Default: Not set
- *EASYNMT_BATCH_SIZE*: Batch size for translation. Default: 16
- *TIMEOUT*: [Gunicorn timeout](https://docs.gunicorn.org/en/stable/settings.html#timeout). Default: 120

All model files are stored at `/cache/`. You can mount this path to your host machine if you want to re-use previously downloaded models.
50 changes: 50 additions & 0 deletions docker/api/cpu.dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
FROM python:3.8-slim
LABEL maintainer="Nils Reimers <info@nils-reimers>"

RUN apt-get update && apt-get -y install -y procps
RUN pip install --no-cache-dir torch==1.8.0+cpu -f https://download.pytorch.org/whl/torch_stable.html

###################################### Same code for all docker files ###############

## Install dependencies
RUN apt-get update && apt-get -y install build-essential
RUN pip install --no-cache-dir "uvicorn[standard]" gunicorn fastapi
COPY ./requirements.txt /requirements.txt
RUN pip install --no-cache-dir -r /requirements.txt
RUN python -m nltk.downloader 'punkt'

#### Scripts to start front- and backend worker

COPY ./start_backend.sh /start_backend.sh
RUN chmod +x /start_backend.sh

COPY ./start_frontend.sh /start_frontend.sh
RUN chmod +x /start_frontend.sh

COPY ./start.sh /start.sh
RUN chmod +x /start.sh

COPY ./gunicorn_conf_backend.py /gunicorn_conf_backend.py
COPY ./gunicorn_conf_frontend.py /gunicorn_conf_frontend.py

#### Woking dir

COPY ./src /app
WORKDIR /app/
ENV PYTHONPATH=/app
EXPOSE 80

####

# Create cache folders
RUN mkdir /cache
RUN mkdir /cache/easynmt
RUN mkdir /cache/transformers
RUN mkdir /cache/torch

ENV EASYNMT_CACHE=/cache/easynmt
ENV TRANSFORMERS_CACHE=/cache/transformers
ENV TORCH_CACHE=/cache/torch

# Run start script
CMD ["/start.sh"]
49 changes: 49 additions & 0 deletions docker/api/cuda10.1.dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
FROM pytorch/pytorch:1.6.0-cuda10.1-cudnn7-runtime
LABEL maintainer="Nils Reimers <info@nils-reimers>"

###################################### Same code for all docker files ###############

## Install dependencies
RUN apt-get update && apt-get -y install build-essential
RUN pip install --no-cache-dir "uvicorn[standard]" gunicorn fastapi
COPY ./requirements.txt /requirements.txt
RUN pip install --no-cache-dir -r /requirements.txt
RUN python -m nltk.downloader 'punkt'

#### Scripts to start front- and backend worker

COPY ./start_backend.sh /start_backend.sh
RUN chmod +x /start_backend.sh

COPY ./start_frontend.sh /start_frontend.sh
RUN chmod +x /start_frontend.sh

COPY ./start.sh /start.sh
RUN chmod +x /start.sh

COPY ./gunicorn_conf_backend.py /gunicorn_conf_backend.py
COPY ./gunicorn_conf_frontend.py /gunicorn_conf_frontend.py

#### Woking dir

COPY ./src /app
WORKDIR /app/
ENV PYTHONPATH=/app
EXPOSE 80

####

# Create cache folders
RUN mkdir /cache/
RUN mkdir /cache/easynmt
RUN mkdir /cache/transformers
RUN mkdir /cache/torch

ENV EASYNMT_CACHE=/cache/easynmt
ENV TRANSFORMERS_CACHE=/cache/transformers
ENV TORCH_CACHE=/cache/torch

# Run start script
CMD ["/start.sh"]


48 changes: 48 additions & 0 deletions docker/api/cuda11.0.dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
FROM pytorch/pytorch:1.7.1-cuda11.0-cudnn8-runtime
LABEL maintainer="Nils Reimers <info@nils-reimers>"

###################################### Same code for all docker files ###############

## Install dependencies
RUN apt-get update && apt-get -y install build-essential
RUN pip install --no-cache-dir "uvicorn[standard]" gunicorn fastapi
COPY ./requirements.txt /requirements.txt
RUN pip install --no-cache-dir -r /requirements.txt
RUN python -m nltk.downloader 'punkt'

#### Scripts to start front- and backend worker

COPY ./start_backend.sh /start_backend.sh
RUN chmod +x /start_backend.sh

COPY ./start_frontend.sh /start_frontend.sh
RUN chmod +x /start_frontend.sh

COPY ./start.sh /start.sh
RUN chmod +x /start.sh

COPY ./gunicorn_conf_backend.py /gunicorn_conf_backend.py
COPY ./gunicorn_conf_frontend.py /gunicorn_conf_frontend.py

#### Woking dir

COPY ./src /app
WORKDIR /app/
ENV PYTHONPATH=/app
EXPOSE 80

####

# Create cache folders
RUN mkdir /cache/
RUN mkdir /cache/easynmt
RUN mkdir /cache/transformers
RUN mkdir /cache/torch

ENV EASYNMT_CACHE=/cache/easynmt
ENV TRANSFORMERS_CACHE=/cache/transformers
ENV TORCH_CACHE=/cache/torch

# Run start script
CMD ["/start.sh"]

50 changes: 50 additions & 0 deletions docker/api/cuda11.1.dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
FROM pytorch/pytorch:1.8.0-cuda11.1-cudnn8-runtime
LABEL maintainer="Nils Reimers <info@nils-reimers>"

###################################### Same code for all docker files ###############

## Install dependencies
RUN apt-get update && apt-get -y install build-essential
RUN pip install --no-cache-dir "uvicorn[standard]" gunicorn fastapi
COPY ./requirements.txt /requirements.txt
RUN pip install --no-cache-dir -r /requirements.txt
RUN python -m nltk.downloader 'punkt'

#### Scripts to start front- and backend worker

COPY ./start_backend.sh /start_backend.sh
RUN chmod +x /start_backend.sh

COPY ./start_frontend.sh /start_frontend.sh
RUN chmod +x /start_frontend.sh

COPY ./start.sh /start.sh
RUN chmod +x /start.sh

COPY ./gunicorn_conf_backend.py /gunicorn_conf_backend.py
COPY ./gunicorn_conf_frontend.py /gunicorn_conf_frontend.py

#### Woking dir

COPY ./src /app
WORKDIR /app/
ENV PYTHONPATH=/app
EXPOSE 80

####

# Create cache folders
RUN mkdir /cache/
RUN mkdir /cache/easynmt
RUN mkdir /cache/transformers
RUN mkdir /cache/torch

ENV EASYNMT_CACHE=/cache/easynmt
ENV TRANSFORMERS_CACHE=/cache/transformers
ENV TORCH_CACHE=/cache/torch

# Run start script
CMD ["/start.sh"]



Loading