From d3d388b934ef515e96246ba643c924d675f6515d Mon Sep 17 00:00:00 2001 From: Suraj Patil Date: Tue, 16 Mar 2021 20:20:00 +0530 Subject: [PATCH] fix M2M100 example (#10745) --- docs/source/model_doc/m2m_100.rst | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/docs/source/model_doc/m2m_100.rst b/docs/source/model_doc/m2m_100.rst index b5c8d46bc919..757e198c2bdb 100644 --- a/docs/source/model_doc/m2m_100.rst +++ b/docs/source/model_doc/m2m_100.rst @@ -43,6 +43,9 @@ multilingual it expects the sequences in a certain format: A special language id source and target text. The source text format is :obj:`[lang_code] X [eos]`, where :obj:`lang_code` is source language id for source text and target language id for target text, with :obj:`X` being the source or target text. +The :class:`~transformers.M2M100Tokenizer` depends on :obj:`sentencepiece` so be sure to install it before running the +examples. To install :obj:`sentencepiece` run ``pip install sentencepiece``. + - Supervised Training .. code-block:: @@ -87,7 +90,7 @@ id for source text and target language id for target text, with :obj:`X` being t "La vie est comme une boƮte de chocolat." >>> # translate Chinese to English - >>> tokenizer.src_lang = "ar_AR" + >>> tokenizer.src_lang = "zh" >>> encoded_zh = tokenizer(chinese_text, return_tensors="pt") >>> generated_tokens = model.generate(**encoded_zh, forced_bos_token_id=tokenizer.get_lang_id("en")) >>> tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)