Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[prakriya] Reduce runtime of batch mode by ~40%
Before: 14.39s After: 8.73s This large commit contains various small projects with implications for performance and correctness. Not all of them are complete, but we are close to a point of diminishing returns, and I think it is better to ship something out rather than keep iterating. Some major changes: - Update our cache hashing logic to use FxHasher instead of DefaultHasher. This results in a ~16% speedup. - Fix a performance issue in `find_next_not_empty`. This results in a ~10% speedup. - Remove CharPrakriya in favor of IndexPrakriya, which iterates over char indices without creating a temporary string. This results in a ~5% speedup. - Start a long migration to reduce string comparisons in favor of enum comparisons. For some early work in this direction, see `internal.rs` and where its types are used, as well as the new `morph` field on `Term` and how we use it in favor of the old `lakshanas` field. Early efforts show modest results, but even if these efforts don't speed up the program, they greatly simplify our logic and feel more correct. Additional changes: - Update integration testing setup to avoid building so many binaries at once. This reduces test runtime by around 66%. Since I don't know when I'll be able to return to performance work, here are some notes on what else we might try to further reduce runtime: - Update `angasya` code to avoid redundant work, specifically around applying dhatu rules to non-dhatus and vice versa. - If possible, avoid running `run_main_rules` for most sanAdi dhatus. We might do the same for `atmanepada` rules. - Continue migrating away from direct string comparisons and expand the new `Aupadeshika` enum to include all dhatu strings. We can try this approach out quickly by using a script to find/replace these strings with their enum counterparts. - Explore "deep caching" to support, e.g., a standard prakriya that uses optional rules only in the `tripadi` phase. - For fast string-enum conversions, use either a binary search or a perfect hash function. This same approach works when checking whether or not a string is part of some gana. - Experiment with `CompactString` again and see if we can get a speed-up by caching the `adi` and `antya` values of a string. We should pursue this only if the approaches above don't work. - Update `IndexPrakriya` to maintain a char counter so that we can skip rules that are out of scope.
- Loading branch information