Skip to content

Commit

Permalink
Merge pull request asyml#91 from TomNong/GPT-2
Browse files Browse the repository at this point in the history
Added gpt-2 language model example
  • Loading branch information
ZhitingHu authored Feb 28, 2019
2 parents 254bef4 + 766e08a commit 521f7ef
Show file tree
Hide file tree
Showing 10 changed files with 716 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -255,3 +255,5 @@ simple-examples.tgz
/examples/bert/bert_pretrained_models/
!/examples/bert/bert_pretrained_models/download_model.sh
/examples/bert/output
/examples/gpt-2/gpt2_pretrained_models/
!/examples/gpt-2/gpt2_pretrained_models/download_model.sh
150 changes: 150 additions & 0 deletions examples/gpt-2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# GPT-2: Pre-trained Langauge Model

This is a Texar implementation of [OpenAI GPT-2 (Generative Pre-Trainning)](https://github.com/openai/gpt-2) language model, which allows to load official pre-trained model parameters, generate samples, etc.

With Texar, building the GPT-2 model is as simple as creating a [`TransformerDecoder`](https://texar.readthedocs.io/en/latest/code/modules.html#transformerdecoder) instance. We can initialize the parameters of the TransformerDecoder using a pre-trained GPT-2 checkpoint by calling `init_gpt2_checkpoint(path_to_gpt2_checkpoint)` .

In sum, this example showcases:

* Contructing and using pre-trained GPT-2 models in Texar
* Using GPT-2 to generate text samples with or without context
* Examples of other use cases

## Quick Start
### Download GPT-2 Pre-trained Model

Download the GPT-2 model checkpoint with the following command:
```
sh gpt2_pretrained_models/download_model.sh 117M
```
By default, it will download a pretrained model named `117M` to `gpt2_pretrained_models/`.

### Usage
| WARNING: Samples are unfiltered and may contain offensive content. |
| --- |

#### Interactive mode (to generate samples with context)

This mode will initialize an interactive interface, which allows users to type in the context sentence. The model then generates continuation of the context. Top-K sample decoding is used.

```
python generative_pretraining_main.py --is_interactive \
--max_decoding_length=100 \
--temperature=0.7 \
--top_k=40
```

Here:

- `is_interactive`: Specifies interactive mode.
- `max_decoding_length`: The maximum number of tokens in the sample. **Note that this includes tokens in the context**.
- `temperature`: Softmax temperature of top-k sample decoding. Larger values (above 1.0) result in more random samples, while smaller values push the sampling distribution towards the argmax. Must be strictly greater than 0. Defaults to `0.7`.
- `top_k`: Number of top most likely candidates from a vocab distribution in each decoding step. Defaults to `40`.
- `nsamples`: Number of samples to generate for each input.

**Example input:**
```
Model input >>> Micheal Jordan is the greatest player in history !
```
**Example output:**
```
======================================== SAMPLE 1 ========================================
He's the one who has made all the difference. He's a true legend. He's a great athlete,
a great athlete. He's a great athlete. I'm so happy for him. I'm so happy for his family,
the family, and I'm so happy for him. I'm so happy for his teammates, his teammates, and
I'm so happy for him.
The last time we saw him on stage, he
================================================================================
```

#### Non-interactive mode (to generate samples from scratch)

This mode generates a batch of samples from scratch.

```
python generative_pretraining_main.py
--nsamples=1 \
--batch_size=1 \
--max_decoding_len=100 \
--temperature=0.7 \
--top_k=40
```

Here:

- `nsamples`: Total number of samples to generate, must be dividable by the `batch_size`.
- `batch_size`: Each iteration generates `batch_size` number of samples.

**Example output:**

```
"A new government and a healthy economy have a chance to take this up."
After he said the election's outcome in the House was important and had helped to build
confidence in the House, former Ukip leader Nigel Farage spoke about working to boost
the economy, saying the vote for the "lefties" and others "were bad optics for Labour
in this way".
```

## Other Use Cases

Texar's `TransformerDecoder` (and other RNN-based decoders) easily supports common, advanced, or customized use, such as:

* Sample or continuation generation
* Greedy / (top-k) sample / Gumbel-softmax / beam-search / ... / your-customized decoding
* Training / fine-tuning in (un)conditional settings
* Perplexity evaluation

**For example**, after creating the language model
```python
decoder = TransformerDecoder(embedder, hparams=gpt2_hparams)
```
We can do

**Ex. Use 1): Continuation generation w/ greedy decoding**

```python
output, output_length = decoder(
context=ctx,
context_sequence_length=ctx_len,
decoding_strategy='infer_greedy',
end_token=end_token)

sample_id = output.sample_id
logits = output.logits
```

**Ex. Use 2): Top-k sample decoding**

```python
topk_helper = tx.modules.TopKSampleEmbeddingHelper(
embedding=embedder,
start_tokens=ctx[:,0],
end_token=end_token,
top_k=20,
softmax_temperature=0.7)

output, output_length = decoder(
context=ctx,
context_sequence_length=ctx_len,
helper=topk_helper)
```

**Ex. Use 3): Fine-tuning for conditional generation**

```python
output, output_length = decoder(
memory=source_hidden_states,
memory_sequence_length=src_len,
inputs=truth_target,
sequence_length=tgt_len-1,
decoding_strategy='train_greedy')

loss = tx.losses.sequence_sparse_softmax_cross_entropy(
lables=truth_target[:, 1:],
logits=output.logits,
sequence_length=tgt_len-1)
```
3 changes: 3 additions & 0 deletions examples/gpt-2/gpt2_config_lib/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
### Configuration files of GPT-2 models in Texar style.

For example, `config_model_117M.py` is the Texar configuration file corresponding to the `117M` model downloaded from [GPT-2 official release](https://github.com/openai/gpt-2).
Empty file.
56 changes: 56 additions & 0 deletions examples/gpt-2/gpt2_config_lib/config_model_117M.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
"""Texar config file of the GPT-2 117M model.
"""

vocab_size = 50257
dim = 768

embed = {
"dim": dim,
}

decoder = {
"scale_embeds": False,
"dim": dim,
"num_blocks": 12,
"multihead_attention": {
"use_bias": True,
"num_units": dim,
"num_heads": 12,
"output_dim": dim,
},
"position_embedder_type": "simple",
"position_size": 1024,
"position_embedder_hparams": {
"dim": dim,
},
"initializer": {
"type": "variance_scaling_initializer",
"kwargs": {
"scale": 1.0,
"mode": "fan_avg",
"distribution": "uniform",
},
},
"poswise_feedforward": {
"layers": [
{
"type": "Dense",
"kwargs": {
"name": "conv1",
"units": dim*4,
"activation": "gelu",
"use_bias": True,
}
},
{
"type": "Dense",
"kwargs": {
"name": "conv2",
"units": dim,
"use_bias": True,
}
}
],
"name": "ffn",
},
}
Loading

0 comments on commit 521f7ef

Please sign in to comment.