Sequential Training: How to fine-tune on MLM? #1287

slowwavesleep · 2021-03-03T18:14:51Z

I'm trying to do Sequential Training based on this example. I want to use MLM as the intermediate task and RTE as the target task. Expectedly, runscript.py in download_data doesn't support MLM.

As far as I understand, I can repurpose config.json from some other task by editing corresponding paths and provide my own data. However, there a few questions:

What is the expected format of examples in jsonl files? What should be the name of the key containing the text itself (like "premise" and "hypothesis" in RTE)? Or is not jsonl at all?
Should I specify the path to val split? Is it going to be utilized?
Do I have to make any modifications in config files elsewhere?

Perhaps, I'm missing some easier way to do this. If so, please point me to it. Ideally, I would like to simply repurpose text examples from some previously downloaded task; possibly, from the same target task. Is there a way to do it? Using existing functionality, that is.

zphang · 2021-03-04T19:59:43Z

Hi,

To clarify, you want to do MLM on some data (potentially from an existing task, but really any text corpus), and then fine-tune on RTE after, correct?

The MLM (specifically MLM-simple) task will read from a file and treat each line as an input:

jiant/jiant/tasks/lib/mlm_simple.py

Lines 58 to 67 in 1ad8628

    
           @classmethod 
        
           def _get_examples_generator(cls, path, set_type): 
        
               with open(path, "r") as f: 
        
                   for (i, line) in enumerate(f): 
        
                       line = line.strip() 
        
                       if not line: 
        
                           continue 
        
                       yield Example( 
        
                           guid="%s-%s" % (set_type, i), text=line, 
        
                       )

You would likely want to:

Create a text file of MLM inputs based on your desired task/corpus
Write a task-config file pointing to the above, with the task type mlm_simple
Set up the intermediate training run. IMPORTANT: for the load mode, you have to set from_transformers_with_mlm to load the MLM head, otherwise it will initialize the MLM head from scratch.
Take the saved weights and fine-tune on RTE

Also note that we only support MLM for BERT, RoBERTa, ALBERT and XLM-R models.

If it helps, I can write a quick colab to show this.

slowwavesleep · 2021-03-04T20:50:34Z

Thank you very much!

Yes, that's exactly what I'm trying to do. At the moment I'm experimenting on roberta-large. I'm trying to find out whether doing MLM as an intermediate task helps with RTE.

I made a config file for mlm_simple:

{
    "task": "mlm_simple",
    "paths": {
        "train": "/content/tasks/data/mlm_simple/train.txt",
        "val": "/content/tasks/data/mlm_simple/val.txt"
    },
    "name": "mlm_simple"
}

Then I took premises and hypotheses from RTE train and val and put them in the corresponding files as individual lines, as you described.

I'm not sure I understand how to use the load mode correctly, though. Can you please elaborate a bit further?

I initialize an MLM run like this.

run_args = main_runscript.RunConfiguration(
    jiant_task_container_config_path="./run_configs/mlm_simple_run_config.json",
    output_dir="./runs/mlm_simple",
    hf_pretrained_model_name_or_path="roberta-large",
    model_path="./models/roberta-large/model/model.p",
    model_config_path="./models/roberta-large/model/config.json",
    model_load_mode="from_transformers_with_mlm",
    learning_rate=1e-5,
    eval_every_steps=500,
    do_train=True,
    do_val=True,
    do_save=True,
    force_overwrite=True,
)
main_runscript.run_loop(run_args)

Here from_transformers_with_mlm mode ensures that existing MLM head weights are used and trained. Do I understand this right?

Then I run RTE task like this.

run_args = main_runscript.RunConfiguration(
    jiant_task_container_config_path="./run_configs/rte_run_config.json",
    output_dir="./runs/mlm_simple___rte",
    hf_pretrained_model_name_or_path="roberta-large",
    model_path="./runs/mlm_simple/best_model.p",  # Loading the best model
    model_load_mode="partial",
    model_config_path="./models/roberta-large/model/config.json",
    learning_rate=1e-5,
    eval_every_steps=500,
    no_improvements_for_n_evals=2,
    do_train=True,
    do_val=True,
    force_overwrite=True,
    do_save_best=True
)
main_runscript.run_loop(run_args)

Do I still need to specify partial load mode here?

zphang · 2021-03-14T20:12:52Z

Sorry for the delay in my reply. Yes, partial or encoder_only will both work. (encoder_only only takes the encoder, partial will try to match task heads, but since none of the task heads match it will also reduce to encoder-only.)

Do let me know if you run into any other issues!

zphang added the assigned Is being looked into/followed-up on by a dev label Mar 14, 2021

slowwavesleep closed this as completed May 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequential Training: How to fine-tune on MLM? #1287

Sequential Training: How to fine-tune on MLM? #1287

slowwavesleep commented Mar 3, 2021

zphang commented Mar 4, 2021 •

edited

Loading

slowwavesleep commented Mar 4, 2021 •

edited

Loading

zphang commented Mar 14, 2021

Sequential Training: How to fine-tune on MLM? #1287

Sequential Training: How to fine-tune on MLM? #1287

Comments

slowwavesleep commented Mar 3, 2021

zphang commented Mar 4, 2021 • edited Loading

slowwavesleep commented Mar 4, 2021 • edited Loading

zphang commented Mar 14, 2021

zphang commented Mar 4, 2021 •

edited

Loading

slowwavesleep commented Mar 4, 2021 •

edited

Loading