Skip to content

Commit 37eaa4a

Browse files
authoredMar 28, 2022
Add missed citations & grammar/typo fixes to constrained beam search blog post (huggingface#239)
1 parent f5c44da commit 37eaa4a

File tree

2 files changed

+15
-13
lines changed

2 files changed

+15
-13
lines changed
 

‎constrained-beam-search.md

+7-6
Original file line numberDiff line numberDiff line change
@@ -44,14 +44,14 @@ Let's say that we're want to generate a sentence `S` that has to include the phr
4444

4545
$$ S_{expected} = \{ s_1, s_2, ..., s_k, t_1, t_2, s_{k+1}, ..., s_n \} $$
4646

47-
The problem is that beam search generates the sequence *token-by-token*. Though not entirely accurate, one can think of beam search as the function \\( B(\mathbf{s}_{0:i}) = s_{i+1} \\), where it looks as the currently generated sequence of tokens from \\( 0 \\) to \\( i \\) then predicts the next token at \\( i+1 \\) . But how can this function know, at an arbitrary step \\( i < k \\) , that the tokens must be generated at some future step \\( k \\) ? Or when it's at the step \\( i=k \\) , how can it know for sure that this is the best spot to force the tokens, instead of some future step \\( i>k \\) ?
47+
The problem is that beam search generates the sequence *token-by-token*. Though not entirely accurate, one can think of beam search as the function \\( B(\mathbf{s}_{0:i}) = s_{i+1} \\), where it looks at the currently generated sequence of tokens from \\( 0 \\) to \\( i \\) then predicts the next token at \\( i+1 \\) . But how can this function know, at an arbitrary step \\( i < k \\) , that the tokens must be generated at some future step \\( k \\) ? Or when it's at the step \\( i=k \\) , how can it know for sure that this is the best spot to force the tokens, instead of some future step \\( i>k \\) ?
4848

4949
![Why constraints are hard](https://raw.githubusercontent.com/huggingface/blog/main/assets/53_constrained_beam_search/why_constraints_are_hard.png)
5050

5151

5252
And what if you have multiple constraints with varying requirements? What if you want to force the phrase \\( p_1=\{t_1, t_2\} \\) *and* also the phrase \\( p_2=\{ t_3, t_4, t_5, t_6\} \\) ? What if you want the model to **choose between** the two phrases? What if we want to force the phrase \\( p_1 \\) and force just one phrase among the list of phrases \\( \{p_{21}, p_{22}, p_{23}\} \\) ?
5353

54-
The above are actually very reasonable use-cases, as it will be shown below, and the new constrained beam search feature allows for all of them!
54+
The above examples are actually very reasonable use-cases, as it will be shown below, and the new constrained beam search feature allows for all of them!
5555

5656
This post will quickly go over what the new ***constrained beam search*** feature can do for you and then go into deeper details about how it works under the hood.
5757

@@ -219,7 +219,7 @@ In the next step, we consider the next possible tokens for each of the three bra
219219

220220
Though we end up *considering* significantly more than `num_beams` outputs, we reduce them down to `num_beams` at the end of the step. We can't just keep branching out, then the number of `beams` we'd have to keep track of would be \\( \text{beams}^{n} \\) for \\( n \\) steps, which becomes very large very quickly ( \\( 10 \\) beams after \\( 10 \\) steps is \\( 10,000,000,000 \\) beams!).
221221

222-
For the rest of the generation, we repeat the above step until an ending criteria has been met, like generating the `<eos>` token or reaching `max_length`, for example. Branch out, rank, reduce, and repeat.
222+
For the rest of the generation, we repeat the above step until the ending criteria has been met, like generating the `<eos>` token or reaching `max_length`, for example. Branch out, rank, reduce, and repeat.
223223

224224

225225

@@ -232,7 +232,7 @@ For the rest of the generation, we repeat the above step until an ending criteri
232232

233233
Constrained beam search attempts to fulfill the constraints by *injecting* the desired tokens at every step of the generation.
234234

235-
Let's say that we're trying to force the phrase `"is fast"` in the generation output.
235+
Let's say that we're trying to force the phrase `"is fast"` in the generated output.
236236

237237
In the traditional beam search setting, we find the top `k` most probable next tokens at each branch and append them for consideration. In the constrained setting, we do the same but also append the tokens that will take us *closer to fulfilling our constraints*. Here's a demonstration:
238238

@@ -261,7 +261,7 @@ This behavior is demonstrated in the third step of the above example:
261261

262262
![Constrained Beam Search Step 3](https://raw.githubusercontent.com/huggingface/blog/main/assets/53_constrained_beam_search/cbeam_3.jpg)
263263

264-
Notice how `"The is fast"` doesn't require any manual appending of constraint tokens since it's already fulfilled (i.e., already contains the phrase `"is fast"`). Also, notice how beams like `"The dog is slow"` or `"The dog is mad"` is actually in Bank 0, since, although it includes the token `"is"`, it must restart from the beginning to generate `"is fast"`. By appending something like `"slow"` after `"is"`, it has effectively *reset its progress*.
264+
Notice how `"The is fast"` doesn't require any manual appending of constraint tokens since it's already fulfilled (i.e., already contains the phrase `"is fast"`). Also, notice how beams like `"The dog is slow"` or `"The dog is mad"` are actually in Bank 0, since, although it includes the token `"is"`, it must restart from the beginning to generate `"is fast"`. By appending something like `"slow"` after `"is"`, it has effectively *reset its progress*.
265265

266266
And finally notice how we ended up at a sensible output that contains our constraint phrase: `"The dog is fast"`!
267267

@@ -324,7 +324,7 @@ template = ["the", "", "School of", "", "in"]
324324

325325
possible_outputs == [
326326
"The woman attended the Ross School of Business in Michigan.",
327-
"The woman was the administrator for the Harvard school of Business in MA."
327+
"The woman was the administrator for the Harvard School of Business in MA."
328328
]
329329
```
330330

@@ -350,6 +350,7 @@ Constrained beam search gives us a flexible means to inject external knowledge a
350350

351351
This new feature is based mainly on the following papers:
352352

353+
- [Guided Open Vocabulary Image Captioning with Constrained Beam Search](https://arxiv.org/pdf/1612.00576.pdf)
353354
- [Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation](https://arxiv.org/abs/1804.06609)
354355
- [Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting](https://aclanthology.org/N19-1090/)
355356
- [Guided Generation of Cause and Effect](https://arxiv.org/pdf/2107.09846.pdf)

‎notebooks/53_constrained_beam_search.ipynb

+8-7
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
"colab_type": "text"
2323
},
2424
"source": [
25-
"<a href=\"https://colab.research.google.com/github/cwkeam/blog/blob/cwkeam%2Fconstrained-beam-search/notebooks/53_constrained_beam_search.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
25+
"<a href=\"https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/53_constrained_beam_search.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
2626
]
2727
},
2828
{
@@ -55,14 +55,14 @@
5555
"\n",
5656
"$S_{expected} = \\{ s_1, s_2, ..., s_k, t_1, t_2, s_{k+1}, ..., s_n \\}$ \n",
5757
"\n",
58-
"The problem is that beam search generates the sequence *token-by-token*. Though not entirely accurate, one can think of beam search as the function $B(\\mathbf{s}_{0:i}) = s_{i+1}$, where it looks as the currently generated sequence of tokens from $0$ to $i$ then predicts the next token at $i+1$. But how can this function know, at an arbitrary step $i < k$, that the tokens must be generated at some future step $k$? Or when it's at the step $i=k$, how can it know for sure that this is the best spot to force the tokens, instead of some future step $i>k$?\n",
58+
"The problem is that beam search generates the sequence *token-by-token*. Though not entirely accurate, one can think of beam search as the function $B(\\mathbf{s}_{0:i}) = s_{i+1}$, where it looks at the currently generated sequence of tokens from $0$ to $i$ then predicts the next token at $i+1$. But how can this function know, at an arbitrary step $i < k$, that the tokens must be generated at some future step $k$? Or when it's at the step $i=k$, how can it know for sure that this is the best spot to force the tokens, instead of some future step $i>k$?\n",
5959
"\n",
6060
"![Why constraints are hard](https://raw.githubusercontent.com/cwkeam/scientific-images/main/why_constraints_are_hard.png)\n",
6161
"\n",
6262
"\n",
6363
"And what if you have multiple constraints with varying requirements? What if you want to force the phrase $p_1=\\{t_1, t_2\\}$ *and* also the phrase $p_2=\\{ t_3, t_4, t_5, t_6\\}$? What if you want the model to **choose between** the two phrases? What if we want to force the phrase $p_1$ and force just one phrase among the list of phrases $\\{p_{21}, p_{22}, p_{23}\\}$? \n",
6464
"\n",
65-
"The above are actually very reasonable use-cases, as it will be shown below, and the new constrained beam search feature allows for all of them!\n",
65+
"The above examples are actually very reasonable use-cases, as it will be shown below, and the new constrained beam search feature allows for all of them!\n",
6666
"\n",
6767
"This post will quickly go over what the new ***constrained beam search*** feature can do for you, and then go into deeper details about how it works under the hood."
6868
]
@@ -325,7 +325,7 @@
325325
"\n",
326326
"Though we end up *considering* significantly more than `num_beams` outputs, we reduce them down to `num_beams` at the end of the step. We can't just keep branching out, then the number of `beams` we'd have to keep track of would be $beams^{n}$ for $n$ steps, which becomes very large very quickly ($10$ beams after $10$ steps is $10,000,000,000$ beams!). \n",
327327
"\n",
328-
"For the rest of the generation, we repeat the above step until an ending criteria has been met, like generating the `<eos>` token or reaching `max_length`, for example. Branch out, rank, reduce, and repeat.\n",
328+
"For the rest of the generation, we repeat the above step until the ending criteria has been met, like generating the `<eos>` token or reaching `max_length`, for example. Branch out, rank, reduce, and repeat.\n",
329329
"\n",
330330
"\n",
331331
"\n",
@@ -341,7 +341,7 @@
341341
"\n",
342342
"Constrained beam search attempts to fulfill the constraints by *injecting* the desired tokens at every step of the generation. \n",
343343
"\n",
344-
"Let's say that we're trying to force the phrase `\"is fast\"` in the generation output. \n",
344+
"Let's say that we're trying to force the phrase `\"is fast\"` in the generated output. \n",
345345
"\n",
346346
"In the traditional beam search setting, we find the top `k` most probable next tokens at each branch and append them for consideration. In the constrained setting we actually do the same, but also append the tokens that will take us *closer to fulfilling our constraints*. Here's a demonstration:\n",
347347
"\n",
@@ -370,7 +370,7 @@
370370
"\n",
371371
"![Constrained Beam Search Step 3](https://raw.githubusercontent.com/cwkeam/scientific-images/main/cbeam_3.jpg)\n",
372372
"\n",
373-
"Notice how `\"The is fast\"` doesn't require any manual appending of constraint tokens since it's already fulfilled (i.e. already contains the phrase `\"is fast\"`). Also notice how beams like `\"The dog is slow\"` or `\"The dog is mad\"` is actually in Bank 0, since, although it includes the token `\"is\"`, it must restart from the beginning in order to generate `\"is fast\"`. By appending something like `\"slow\"` after `\"is\"`, it has effectively *reset its progress*. \n",
373+
"Notice how `\"The is fast\"` doesn't require any manual appending of constraint tokens since it's already fulfilled (i.e. already contains the phrase `\"is fast\"`). Also notice how beams like `\"The dog is slow\"` or `\"The dog is mad\"` are actually in Bank 0, since, although it includes the token `\"is\"`, it must restart from the beginning in order to generate `\"is fast\"`. By appending something like `\"slow\"` after `\"is\"`, it has effectively *reset its progress*. \n",
374374
"\n",
375375
"And finally notice how we ended up at a sensible output that contains our constraint phrase: `\"The dog is fast\"`! \n",
376376
"\n",
@@ -461,7 +461,7 @@
461461
"\n",
462462
"possible_outputs == [\n",
463463
" \"The woman attended the Ross School of Business in Michigan.\",\n",
464-
" \"The woman was the administrator for the Harvard school of Business in MA.\"\n",
464+
" \"The woman was the administrator for the Harvard School of Business in MA.\"\n",
465465
"]\n",
466466
"```\n",
467467
"\n",
@@ -493,6 +493,7 @@
493493
"\n",
494494
"This new feature is based mainly on the following papers:\n",
495495
"\n",
496+
" - [Guided Open Vocabulary Image Captioning with Constrained Beam Search](https://arxiv.org/abs/1612.00576)\n",
496497
" - [Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation](https://arxiv.org/abs/1804.06609)\n",
497498
" - [Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting](https://aclanthology.org/N19-1090/)\n",
498499
" - [Guided Generation of Cause and Effect](https://arxiv.org/pdf/2107.09846.pdf)\n",

0 commit comments

Comments
 (0)
Please sign in to comment.