Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequential Training: How to fine-tune on MLM? #1287

Closed
slowwavesleep opened this issue Mar 3, 2021 · 3 comments
Closed

Sequential Training: How to fine-tune on MLM? #1287

slowwavesleep opened this issue Mar 3, 2021 · 3 comments
Labels
assigned Is being looked into/followed-up on by a dev

Comments

@slowwavesleep
Copy link

I'm trying to do Sequential Training based on this example. I want to use MLM as the intermediate task and RTE as the target task. Expectedly, runscript.py in download_data doesn't support MLM.

As far as I understand, I can repurpose config.json from some other task by editing corresponding paths and provide my own data. However, there a few questions:

  1. What is the expected format of examples in jsonl files? What should be the name of the key containing the text itself (like "premise" and "hypothesis" in RTE)? Or is not jsonl at all?
  2. Should I specify the path to val split? Is it going to be utilized?
  3. Do I have to make any modifications in config files elsewhere?

Perhaps, I'm missing some easier way to do this. If so, please point me to it. Ideally, I would like to simply repurpose text examples from some previously downloaded task; possibly, from the same target task. Is there a way to do it? Using existing functionality, that is.

@zphang
Copy link
Collaborator

zphang commented Mar 4, 2021

Hi,

To clarify, you want to do MLM on some data (potentially from an existing task, but really any text corpus), and then fine-tune on RTE after, correct?

The MLM (specifically MLM-simple) task will read from a file and treat each line as an input:

@classmethod
def _get_examples_generator(cls, path, set_type):
with open(path, "r") as f:
for (i, line) in enumerate(f):
line = line.strip()
if not line:
continue
yield Example(
guid="%s-%s" % (set_type, i), text=line,
)

You would likely want to:

  1. Create a text file of MLM inputs based on your desired task/corpus
  2. Write a task-config file pointing to the above, with the task type mlm_simple
  3. Set up the intermediate training run. IMPORTANT: for the load mode, you have to set from_transformers_with_mlm to load the MLM head, otherwise it will initialize the MLM head from scratch.
  4. Take the saved weights and fine-tune on RTE

Also note that we only support MLM for BERT, RoBERTa, ALBERT and XLM-R models.

If it helps, I can write a quick colab to show this.

@slowwavesleep
Copy link
Author

slowwavesleep commented Mar 4, 2021

Thank you very much!

Yes, that's exactly what I'm trying to do. At the moment I'm experimenting on roberta-large. I'm trying to find out whether doing MLM as an intermediate task helps with RTE.

I made a config file for mlm_simple:

{
    "task": "mlm_simple",
    "paths": {
        "train": "/content/tasks/data/mlm_simple/train.txt",
        "val": "/content/tasks/data/mlm_simple/val.txt"
    },
    "name": "mlm_simple"
}

Then I took premises and hypotheses from RTE train and val and put them in the corresponding files as individual lines, as you described.

I'm not sure I understand how to use the load mode correctly, though. Can you please elaborate a bit further?

I initialize an MLM run like this.

run_args = main_runscript.RunConfiguration(
    jiant_task_container_config_path="./run_configs/mlm_simple_run_config.json",
    output_dir="./runs/mlm_simple",
    hf_pretrained_model_name_or_path="roberta-large",
    model_path="./models/roberta-large/model/model.p",
    model_config_path="./models/roberta-large/model/config.json",
    model_load_mode="from_transformers_with_mlm",
    learning_rate=1e-5,
    eval_every_steps=500,
    do_train=True,
    do_val=True,
    do_save=True,
    force_overwrite=True,
)
main_runscript.run_loop(run_args)

Here from_transformers_with_mlm mode ensures that existing MLM head weights are used and trained. Do I understand this right?

Then I run RTE task like this.

run_args = main_runscript.RunConfiguration(
    jiant_task_container_config_path="./run_configs/rte_run_config.json",
    output_dir="./runs/mlm_simple___rte",
    hf_pretrained_model_name_or_path="roberta-large",
    model_path="./runs/mlm_simple/best_model.p",  # Loading the best model
    model_load_mode="partial",
    model_config_path="./models/roberta-large/model/config.json",
    learning_rate=1e-5,
    eval_every_steps=500,
    no_improvements_for_n_evals=2,
    do_train=True,
    do_val=True,
    force_overwrite=True,
    do_save_best=True
)
main_runscript.run_loop(run_args)

Do I still need to specify partial load mode here?

@zphang
Copy link
Collaborator

zphang commented Mar 14, 2021

Sorry for the delay in my reply. Yes, partial or encoder_only will both work. (encoder_only only takes the encoder, partial will try to match task heads, but since none of the task heads match it will also reduce to encoder-only.)

Do let me know if you run into any other issues!

@zphang zphang added the assigned Is being looked into/followed-up on by a dev label Mar 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assigned Is being looked into/followed-up on by a dev
Projects
None yet
Development

No branches or pull requests

2 participants