English-to-Vietnamese Neural Machine Translation

Viblo tutorial: https://viblo.asia/p/cung-tim-hieu-he-thong-dich-may-mang-no-ron-tu-dau-tu-bleu-score-den-beam-search-decoding-oK9VyxDXLQR

Full pipeline notebook:

Google Colab Streamlit link:

English-Vietnamese Neural Machine Translation implementation from the scratch with PyTorch.

1. Setup 🧰

Create virtual environment then install required packages:

python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt

2. Configuration 🛠️

If you would like to utilize your own dataset and hyperparameters, there are two methods to achieve this:

1. Modifying the default config.yml file: Open the config.yml file and make the necessary adjustments to the variables.
2. Specifying a custom configuration file: By default, the configuration file path is set to config.yml. However, if you wish to conduct different experiments simultaneously, you can create your own configuration file in YAML format and pass its path using the --config flag. Please note that our code currently supports reading configuration files in the YAML format exclusively. For instance:
```
python train.py --config my_config.yml
```

3. Usage 👨‍💻

The complete pipeline encompasses three primary processes:

1. ⚙️ Preprocessing: This involves reading English-Vietnamese parallel corpora, tokenizing the sentences, building vocabularies, and mapping the tokenized sentences to tensors. The resulting tensors are then saved into a DataLoader, along with the trained tokenizers containing the vocabulary for both languages. To execute the preprocessing step, run the following command:
```
python preprocess.py --config config.yml
```
2. 🚄 Training and Validation: In this step, the prepared tokenizers and DataLoaders are loaded. A Seq2Seq model is created, and if available, its checkpoint is loaded. The training process is initiated, and the results are recorded in a CSV file. To train and validate the model, use the following command:
```
python train.py --config config.yml
```
3. 🧪 Testing: This step involves testing the pretrained model on the testing DataLoader. For each pair of sentences, the source, target, predicted sentences, and their respective scores are printed. To perform the testing, execute the following command:
```
python test.py --config config.yml
```

Alternatively, you can run the full pipeline with a single command using the following:

bash full_pipeline.sh --config config.yml

4. Inference with Streamlit 🚀

To facilitate an intuitive interaction with your trained model, follow the steps below to host a web server using Streamlit:

Execute the command provided below to initiate the server:
```
streamlit run inference_streamlit.py -- --config config.yml
```
You have the option to replace config.yml with your customized configuration file.
If you prefer to test the server on Google Colab or do not have access to a GPU device, you can conveniently host your server there. Simply access the demo Google Colab Streamlit link:
This link will allow you to utilize my trained model and tokenizers.

Ensure that you have the necessary dependencies and libraries installed before running the Streamlit server. This will enable you to interact seamlessly with your trained model and explore its capabilities through a user-friendly web interface.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
assets		assets
dataset		dataset
models		models
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yml		config.yml
full_pipeline.sh		full_pipeline.sh
inference_streamlit.py		inference_streamlit.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

English-to-Vietnamese Neural Machine Translation

1. Setup 🧰

2. Configuration 🛠️

3. Usage 👨‍💻

4. Inference with Streamlit 🚀

5. References 📝

About

Releases

Packages

Languages

License

egliette/EnVi_NMT

Folders and files

Latest commit

History

Repository files navigation

English-to-Vietnamese Neural Machine Translation

1. Setup 🧰

2. Configuration 🛠️

3. Usage 👨‍💻

4. Inference with Streamlit 🚀

5. References 📝

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages