This module can be used to replace keywords in sentences or extract keywords from sentences.
$ pip install flashtext
- Extract keywords
>>> from flashtext.keyword import KeywordProcessor >>> keyword_processor = KeywordProcessor() >>> keyword_processor.add_keyword('Big Apple', 'New York') >>> keyword_processor.add_keyword('Bay Area') >>> keywords_found = keyword_processor.extract_keywords('I love Big Apple and Bay Area.') >>> keywords_found >>> # ['New York', 'Bay Area']
- Replace keywords
>>> keyword_processor.add_keyword('New Delhi', 'NCR region') >>> new_sentence = keyword_processor.replace_keywords('I love Big Apple and new delhi.') >>> new_sentence >>> # 'I love New York and NCR region.'
- Case Sensitive example
>>> from flashtext.keyword import KeywordProcessor >>> keyword_processor = KeywordProcessor(case_sensitive=True) >>> keyword_processor.add_keyword('Big Apple', 'New York') >>> keyword_processor.add_keyword('Bay Area') >>> keywords_found = keyword_processor.extract_keywords('I love big Apple and Bay Area.') >>> keywords_found >>> # ['Bay Area']
- No clean name for Keywords
>>> from flashtext.keyword import KeywordProcessor >>> keyword_processor = KeywordProcessor() >>> keyword_processor.add_keyword('Big Apple') >>> keyword_processor.add_keyword('Bay Area') >>> keywords_found = keyword_processor.extract_keywords('I love big Apple and Bay Area.') >>> keywords_found >>> # ['Big Apple', 'Bay Area']
For detecting Word Boundary currently any character other than this \w [A-Za-z0-9_] is considered a word boundary.
- To set or add characters as part of word characters
>>> from flashtext.keyword import KeywordProcessor >>> keyword_processor = KeywordProcessor() >>> keyword_processor.add_keyword('Big Apple') >>> print(keyword_processor.extract_keywords('I love Big Apple/Bay Area.')) >>> # ['Big Apple'] >>> keyword_processor.add_non_word_boundary('/') >>> print(keyword_processor.extract_keywords('I love Big Apple/Bay Area.')) >>> # []
Documentation can be found at FlashText Read the Docs.
$ git clone https://github.com/vi3k6i5/flashtext $ cd flashtext $ pip install pytest $ python setup.py test
$ git clone https://github.com/vi3k6i5/flashtext $ cd flashtext/docs $ pip install sphinx $ make html $ # open _build/html/index.html in browser to view it locally
It's a custom algorithm based on Aho-Corasick algorithm and Trie Dictionary.

To do the same with regex it will take a lot of time:
Docs count | # Keywords | : | Regex | flashtext |
---|---|---|---|---|
1.5 million | 2K | : | 16 hours | Not measured |
2.5 million | 10K | : | 15 days | 15 mins |
The idea for this library came from the following StackOverflow question.
- Issue Tracker: https://github.com/vi3k6i5/flashtext/issues
- Source Code: https://github.com/vi3k6i5/flashtext/
The project is licensed under the MIT license.