Skip to content

Commit

Permalink
[prakriya] Reduce runtime of batch mode by ~40%
Browse files Browse the repository at this point in the history
Before: 14.39s
After: 8.73s

This large commit contains various small projects with implications for
performance and correctness. Not all of them are complete, but we are
close to a point of diminishing returns, and I think it is better to
ship something out rather than keep iterating.

Some major changes:

- Update our cache hashing logic to use FxHasher instead of
  DefaultHasher. This results in a ~16% speedup.

- Fix a performance issue in `find_next_not_empty`. This results in a
  ~10% speedup.

- Remove CharPrakriya in favor of IndexPrakriya, which iterates over
  char indices without creating a temporary string. This results in a
  ~5% speedup.

- Start a long migration to reduce string comparisons in favor of enum
  comparisons. For some early work in this direction, see `internal.rs`
  and where its types are used, as well as the new `morph` field on
  `Term` and how we use it in favor of the old `lakshanas` field. Early
  efforts show modest results, but even if these efforts don't speed up
  the program, they greatly simplify our logic and feel more correct.

Additional changes:

- Update integration testing setup to avoid building so many binaries at
  once. This reduces test runtime by around 66%.

Since I don't know when I'll be able to return to performance work, here
are some notes on what else we might try to further reduce runtime:

- Update `angasya` code to avoid redundant work, specifically around
  applying dhatu rules to non-dhatus and vice versa.

- If possible, avoid running `run_main_rules` for most sanAdi dhatus. We
  might do the same for `atmanepada` rules.

- Continue migrating away from direct string comparisons and expand the
  new `Aupadeshika` enum to include all dhatu strings. We can try this
  approach out quickly by using a script to find/replace these strings
  with their enum counterparts.

- Explore "deep caching" to support, e.g., a standard prakriya that uses
  optional rules only in the `tripadi` phase.

- For fast string-enum conversions, use either a binary search or a
  perfect hash function. This same approach works when checking whether
  or not a string is part of some gana.

- Experiment with `CompactString` again and see if we can get a speed-up
  by caching the `adi` and `antya` values of a string. We should pursue
  this only if the approaches above don't work.

- Update `IndexPrakriya` to maintain a char counter so that we can skip
  rules that are out of scope.
  • Loading branch information
akprasad committed Dec 1, 2024
1 parent 2b2dd88 commit c978e10
Show file tree
Hide file tree
Showing 121 changed files with 3,922 additions and 3,288 deletions.
17 changes: 12 additions & 5 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions vidyut-prakriya/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ rayon = "1.6.1"
wasm-bindgen = "0.2"
serde-wasm-bindgen = "0.4"
console_error_panic_hook = "0.1.7"
rustc-hash = "2.0.0"

[dev-dependencies]
test_utils = { path = "test_utils" }
Expand Down
4 changes: 2 additions & 2 deletions vidyut-prakriya/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ test_tinantas:
--hash "6964b61ad01cfe81b9b8c58cee957cb10db4462cecfded394ebb4631011c742a"
../target/release/test_tinantas \
--test-cases test-files/tinantas-nic-kartari.csv \
--hash "b95f59ca9b070c2eb09edefbea198ef6d4089f28fb494f06993a7a64864fec61"
--hash "68356a81e3b19ba979d549af983ac20b26fd2b98fa719a68dfedd580cf12672a"
../target/release/test_tinantas \
--test-cases test-files/tinantas-san-kartari.csv \
--hash "f6d9f65013aa7b420ce57e138872a1acf72285ccfa860a0b2bcf3789e9867a79"
Expand All @@ -69,7 +69,7 @@ test_tinantas:
--hash "3badfc693401fb722c640875e06fb3169bd8e04176503c6501a1cd2f24fcec37"
../target/release/test_tinantas \
--test-cases test-files/tinantas-yan-luk-kartari.csv \
--hash "8e55532333708504593040b4b1025a8568f6d23a07cedd3a8f31e155ae0cc2fb"
--hash "e1fcbd4fa51885e2547a62a81a9e797483ca7905c5e961c622f5c49be454e1d4"
../target/release/test_tinantas \
--test-cases test-files/tinantas-san-nic-kartari.csv \
--hash "0758ec35cdd3867f8d58c9854ca60a5f09f919db375ffb2f83cd2dd117a0b4ab"
Expand Down
2 changes: 1 addition & 1 deletion vidyut-prakriya/examples/bench.rs
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ fn run(dhatupatha: Dhatupatha) -> Result<(), Box<dyn Error>> {
let dhatu_sample: Vec<_> = dhatupatha
.iter()
.enumerate()
.filter(|(i, _)| i % 10 == 0)
.filter(|(i, _)| *i % 10 == 0)
.map(|(_, x)| x)
.collect();

Expand Down
Loading

0 comments on commit c978e10

Please sign in to comment.