Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Use `resize` instead of `append` to pad features **This Commit** Updates various instances of `append(&mut vec![...])` with `resize(...)`. **Why?** As a micro optimization. I don't expect this to affect any benchmarks since the change will be so small compared to the time it takes the model to do anything but I ran a small benchmark and this seemed to be the fastest way to do this because (I think): - we allocate each attention mask exactly once to the correct capacity - we don't allocate a new vector to append to the existing one[^1] And, while this won't speed up anything in practice, I think it might read more clearly since `resize` tells you the final length so we can see that all the vectors are the same final length and match `max_len`. <details> <summary>Micro benchmark run</summary> The rust code was roughly: ```rust pub fn append(len: usize, max: usize) -> Vec<usize> { let mut v = vec![1; len]; v.append(&mut vec![0; max - v.len()]); v } pub fn resize(len: usize, max: usize) -> Vec<usize> { let mut v = Vec::with_capacity(max); v.resize(len, 1); v.resize(max, 0); v } pub fn halfway(len: usize, max: usize) -> Vec<usize> { let mut v = vec![1; len]; v.resize(max, 0); v } pub fn overwrite(len: usize, max: usize) -> Vec<usize> { let mut v = vec![1; max]; v[len..max].fill(0); v } ``` and the parameters were roughly: ```rust for size in [10, 500, 1000, 5000] { for max in [size, size + 1, size * 2, size * 10] { ``` and `resize` was consistently the fastest. `halfway` was similar most of the time but consistently slightly slower. `overwrite` was slower than those for reasons I don't understand and `append` was consistently the slowest (though, of course, the difference was very small when we were appending zero or one elements). </details> [^1]: I can't really read assembly but in [this small godbolt example][0] I see `__rust_alloc`, `__rust_alloc_zeroed`, and `do_reserve_and_handle` so I don't think the compiler is seeing the upcoming allocation and handling it all on the initial allocation. [0]: https://godbolt.org/z/eTsnjn9Tq * Padding simplification for sequence generation pipelines * Move call to `.get_pad_id` outside loop **Why?** Because it's the same for every iteration. See [this comment][0] for more details. [0]: https://github.com/guillaume-be/rust-bert/pull/254/files#r873138871 * Remove comments on `pad_features` **Why?** I tried to add some comments but didn't understand the problem space well enough to correctly document what the returned masks do. See [this comment][0] for more details. [0]: https://github.com/guillaume-be/rust-bert/pull/254/files#r873138314 Co-authored-by: Guillaume Becquin <[email protected]>
- Loading branch information