A tokenizer that can be installed with: npm i tokenizer-1dv610 visit the github repo for more info if this page is not up to date
npm i tokenizer-1dv610
npm i tokenizer-1dv610
initilize a new object of the class 'Tokenizer' and pass in 2 arguments, the first one is the string you want to tokenize, the second on is an instance of the Grammar class that is shown below
import { Tokenizer, Grammar } from 'tokenizer-1dv610'
let wordAndDotGrammar = new Grammar()
wordAndDotGrammar.addGrammar({'regex':/^[\A-Za-z|åäöÅÄÖ]+/g,'type':'word'})
wordAndDotGrammar.addGrammar({'regex':/^./g,'type':'dot', 'sentenceEnding':'normal sentence'})
wordAndDotGrammar.addGrammar({'regex':/^?/g,'type':'question mark', 'sentenceEnding':'question'})
wordAndDotGrammar.addGrammar({'regex':/^!/g,'type':'exclamation mark', 'sentenceEnding':'announcement'})
let tokenizer = new Tokenizer('Hello this is a sentence.',wordAndDotGrammar)
To get the current token use:
tokenizer.getCurrentToken()
To step to the next token use:
tokenizer.next()
To step to the previous token use:
tokenizer.previous()
To change the string use:
tokenizer.setString('new string')
'Hello this is a sentence.' if you try it out with the above code sentence:
console.log(tokenizer.getCurrentToken())
output: "Hello"
tokenizer.next()
tokenizer.next()
console.log(tokenizer.getCurrentToken())
output: "is"