Skip to content

Commit

Permalink
added doc changes
Browse files Browse the repository at this point in the history
  • Loading branch information
saransh-mehta committed Jun 4, 2020
1 parent a6433b9 commit 3e0d6a6
Show file tree
Hide file tree
Showing 2 changed files with 98 additions and 15 deletions.
111 changes: 97 additions & 14 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@

==============
multi-task-NLP
==============

multi_task_NLP is a utility toolkit enabling NLP developers to easily train and infer a single model for multiple tasks.
We support various data formats for majority of NLI tasks and multiple transformer-based encoders (eg. BERT, Distil-BERT, ALBERT, RoBERTa, XLNET etc.)

For complete documentation for this library, please refer to `documentation <https://multi-task-nlp.readthedocs.io/en/latest/>`_

What is multi_task_NLP about?
-----------------------------

Expand Down Expand Up @@ -35,20 +36,102 @@ Quickstart Guide
A quick guide to show how a single model can be trained for multiple NLI tasks in just 3 simple steps
and with **no requirement to code!!**

.. toctree::
quickstart
Follow these 3 simple steps to train your multi-task model!

Step 1 - Define your task file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Task file is a YAML format file where you can add all your tasks for which you want to train a multi-task model.

::

TaskA:
model_type: BERT
config_name: bert-base-uncased
dropout_prob: 0.05
label_map_or_file:
-label1
-label2
-label3
metrics:
- accuracy
loss_type: CrossEntropyLoss
task_type: SingleSenClassification
file_names:
- taskA_train.tsv
- taskA_dev.tsv
- taskA_test.tsv

TaskB:
model_type: BERT
config_name: bert-base-uncased
dropout_prob: 0.3
label_map_or_file: data/taskB_train_label_map.joblib
metrics:
- seq_f1
- seq_precision
- seq_recall
loss_type: NERLoss
task_type: NER
file_names:
- taskB_train.tsv
- taskB_dev.tsv
- taskB_test.tsv

For knowing about the task file parameters to make your task file, `task file parameters <https://multi-task-nlp.readthedocs.io/en/latest/define_multi_task_model.html#task-file-parameters>`_.

Step 2 - Run data preparation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

After defining the task file, run the following command to prepare the data.

>>> python data_preparation.py \
--task_file 'sample_task_file.yml' \
--data_dir 'data' \
--max_seq_len 50

For knowing about the ``data_preparation.py`` script and its arguments, refer `running data preparation <https://multi-task-nlp.readthedocs.io/en/latest/training.html#running-data-preparation>`_.

Step 3 - Run train
^^^^^^^^^^^^^^^^^^

Finally you can start your training using the following command.

>>> python train.py \
--data_dir 'data/bert-base-uncased_prepared_data' \
--task_file 'sample_task_file.yml' \
--out_dir 'sample_out' \
--epochs 5 \
--train_batch_size 4 \
--eval_batch_size 8 \
--grad_accumulation_steps 2 \
--log_per_updates 25 \
--save_per_updates 1000 \
--eval_while_train True \
--test_while_train True \
--max_seq_len 50 \
--silent True

For knowing about the ``train.py`` script and its arguments, refer `running train <https://multi-task-nlp.readthedocs.io/en/latest/training.html#running-train>`_.


How to Infer?
=============

Once you have a multi-task model trained on your tasks, we provide a convenient and easy way to use it for getting
predictions on samples through the **inference pipeline**.

For running inference on samples using a trained model for say TaskA, TaskB and TaskC,
you can import ``InferPipeline`` class and load the corresponding multi-task model by making an object of this class.

Step by Step Guide
------------------
A complete guide explaining all the components of multi-task-NLP in sequential order.
>>> from infer_pipeline import inferPipeline
>>> pipe = inferPipeline(modelPath = 'sample_out_dir/multi_task_model.pt', maxSeqLen = 50)

.. toctree::
:maxdepth: 2
``infer`` function can be called to get the predictions for input samples
for the mentioned tasks.

task_formats
data_transformations
shared_encoder
define_multi_task_model
training
infering
>>> samples = [ ['sample_sentence_1'], ['sample_sentence_2'] ]
>>> tasks = ['TaskA', 'TaskB']
>>> pipe.infer(samples, tasks)

For knowing about the ``infer_pipeline``, refer `infer <https://multi-task-nlp.readthedocs.io/en/latest/infering.html>`_.
2 changes: 1 addition & 1 deletion infer_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,7 @@ def infer(self, dataList, taskNamesList, batchSize = 8, seed=42):
Example::
>>> samples = [ ['sample_sentence_1], ['sample_sentence_2'] ]
>>> samples = [ ['sample_sentence_1'], ['sample_sentence_2'] ]
>>> tasks = ['TaskA', 'TaskB']
>>> pipe.infer(samples, tasks)
Expand Down

0 comments on commit 3e0d6a6

Please sign in to comment.