Skip to content

Commit

Permalink
Doc test readme (asyml#830)
Browse files Browse the repository at this point in the history
* fixed the example and change TerminalReader -> StringReader

* change output format

* add test README.md workflow

* fix python version condition and pytest input file path

* test readme in a new build

* fixed test path

* add required package to README test

* add nltk

* remove pytest
  • Loading branch information
hepengfe authored Jun 22, 2022
1 parent ac34f10 commit 3ed661e
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 9 deletions.
21 changes: 21 additions & 0 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -239,3 +239,24 @@ jobs:
token: ${{ secrets.REPO_DISPATCH_PAT_HECTOR }}
repository: asyml/forte-wrappers
event-type: trigger-forte-wrappers

readme:
needs: build
runs-on: ubuntu-latest
env:
python-version: 3.9
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ env.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ env.python-version }}

- name: Test README.md when python version is 3.9
run: |
pip install mkcodes
pip install --progress-bar off .
pip install --progress-bar off forte.spacy nltk
mkcodes --github --output tests/temp_readme_test.py README.md
python tests/temp_readme_test.py
rm tests/temp_readme_test.py
22 changes: 13 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,6 @@ pip install forte.spacy
Let's start by writing a simple processor that analyze POS tags to tokens using the good old NLTK library.
```python
import nltk

from forte.processors.base import PackProcessor
from forte.data.data_pack import DataPack
from ft.onto.base_ontology import Token
Expand All @@ -105,14 +104,13 @@ class NLTKPOSTagger(PackProcessor):
def _process(self, input_pack: DataPack):
# get a list of token data entries from `input_pack`
# using `DataPack.get()`` method
token_entries = input_pack.get(Token)
token_texts = [token.text for token in token_entries]
token_texts = [token.text for token in input_pack.get(Token)]

# use nltk pos tagging module to tag token texts
taggings = nltk.pos_tag(token_texts)

# assign nltk taggings to token attributes
for token, tag in zip(token_entries, taggings):
for token, tag in zip(input_pack.get(Token), taggings):
token.pos = tag[1]
```
If we break it down, we will notice there are two main functions.
Expand All @@ -127,31 +125,37 @@ a full pipeline.
```python
from forte import Pipeline

from forte.data.readers import TerminalReader
from forte.data.readers import StringReader
from fortex.spacy import SpacyProcessor

pipeline: Pipeline = Pipeline[DataPack]()
pipeline.set_reader(TerminalReader())
pipeline.set_reader(StringReader())
pipeline.add(SpacyProcessor(), {"processors": ["sentence", "tokenize"]})
pipeline.add(NLTKPOSTagger())
```

Here we have successfully created a pipeline with a few components:
* a `TerminalReader` that reads data from terminal
* a `StringReader` that reads data from a string.
* a `SpacyProcessor` that calls SpaCy to split the sentences and create tokenization
* and finally the brand new `NLTKPOSTagger` we just implemented,

Let's see it run in action!

```python
for pack in pipeline.initialize().process_dataset():
input_string = "Forte is a data-centric ML framework"
for pack in pipeline.initialize().process_dataset(input_string):
for sentence in pack.get("ft.onto.base_ontology.Sentence"):
print("The sentence is: ", sentence.text)
print("The POS tags of the tokens are:")
for token in pack.get(Token, sentence):
print(f" {token.text}({token.pos})", end = " ")
print(f" {token.text}[{token.pos}]", end = " ")
print()
```
It gives us output as follows:

```
Forte[NNP] is[VBZ] a[DT] data[NN] -[:] centric[JJ] ML[NNP] framework[NN] .[.]
```

We have successfully created a simple pipeline. In the nutshell, the `DataPack`s are
the standard packages "flowing" on the pipeline. They are created by the reader, and
Expand Down

0 comments on commit 3ed661e

Please sign in to comment.