Partial python port of java SRX segmenter, originally written by Jarek Lipski aka loomchild
.
In a nutshell it allows you to tokenize texts into the sentences (but generally it's rule-based, so you can chop anything textual).
Shipped with segment.srx
set of segmentation rules for different languages, crafted by the great team of languagetool.
- Python port: Dmytro Chaplynskyi
- Original Java implementation: Jarek Lipski
- Segmentation rules: Daniel Naber, Jaume Ortolà et al (153 contributors!)
- Special thanks to Andriy Rysin