Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
docs		docs
flashtext		flashtext
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.rst		README.rst
benchmark.png		benchmark.png
setup.cfg		setup.cfg
setup.py		setup.py

Repository files navigation

FlashText

This module can be used to replace keywords in sentences or extract keywords from sentences.

Installation

$ pip install flashtext

Usage

Extract keywords

>>> from flashtext.keyword import KeywordProcessor
>>> keyword_processor = KeywordProcessor()
>>> keyword_processor.add_keyword('Big Apple', 'New York')
>>> keyword_processor.add_keyword('Bay Area')
>>> keywords_found = keyword_processor.extract_keywords('I love Big Apple and Bay Area.')
>>> keywords_found
>>> # ['New York', 'Bay Area']

Replace keywords

>>> keyword_processor.add_keyword('New Delhi', 'NCR region')
>>> new_sentence = keyword_processor.replace_keywords('I love Big Apple and new delhi.')
>>> new_sentence
>>> # 'I love New York and NCR region.'

Case Sensitive example

>>> from flashtext.keyword import KeywordProcessor
>>> keyword_processor = KeywordProcessor(case_sensitive=True)
>>> keyword_processor.add_keyword('Big Apple', 'New York')
>>> keyword_processor.add_keyword('Bay Area')
>>> keywords_found = keyword_processor.extract_keywords('I love big Apple and Bay Area.')
>>> keywords_found
>>> # ['Bay Area']

No clean name for Keywords

>>> from flashtext.keyword import KeywordProcessor
>>> keyword_processor = KeywordProcessor()
>>> keyword_processor.add_keyword('Big Apple')
>>> keyword_processor.add_keyword('Bay Area')
>>> keywords_found = keyword_processor.extract_keywords('I love big Apple and Bay Area.')
>>> keywords_found
>>> # ['Big Apple', 'Bay Area']

For detecting Word Boundary currently any character other than this \w [A-Za-z0-9_] is considered a word boundary.

To set or add characters as part of word characters

>>> from flashtext.keyword import KeywordProcessor
>>> keyword_processor = KeywordProcessor()
>>> keyword_processor.add_keyword('Big Apple')
>>> print(keyword_processor.extract_keywords('I love Big Apple/Bay Area.'))
>>> # ['Big Apple']
>>> keyword_processor.add_non_word_boundary('/')
>>> print(keyword_processor.extract_keywords('I love Big Apple/Bay Area.'))
>>> # []

API doc

Documentation can be found at FlashText Read the Docs.

Test

$ git clone https://github.com/vi3k6i5/flashtext
$ cd flashtext
$ pip install pytest
$ python setup.py test

Build Docs

$ git clone https://github.com/vi3k6i5/flashtext
$ cd flashtext/docs
$ pip install sphinx
$ make html
$ # open _build/html/index.html in browser to view it locally

Why not Regex?

It's a custom algorithm based on Aho-Corasick algorithm and Trie Dictionary.

To do the same with regex it will take a lot of time:

Docs count	# Keywords	:	Regex	flashtext
1.5 million	2K	:	16 hours	Not measured
2.5 million	10K	:	15 days	15 mins

The idea for this library came from the following StackOverflow question.

Contribute

Issue Tracker: https://github.com/vi3k6i5/flashtext/issues
Source Code: https://github.com/vi3k6i5/flashtext/

License

The project is licensed under the MIT license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlashText

Installation

Usage

API doc

Test

Build Docs

Why not Regex?

Contribute

License

About

Releases

Packages

Languages

License

tamuhey/flashtext

Folders and files

Latest commit

History

Repository files navigation

FlashText

Installation

Usage

API doc

Test

Build Docs

Why not Regex?

Contribute

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages