Skip to content

omar93/tokenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 

Repository files navigation

tokenizer

A tokenizer that can be installed with: npm i tokenizer-1dv610 visit the github repo for more info if this page is not up to date

Installation

npm i tokenizer-1dv610

How to use

npm i tokenizer-1dv610

initilize a new object of the class 'Tokenizer' and pass in 2 arguments, the first one is the string you want to tokenize, the second on is an instance of the Grammar class that is shown below

import { Tokenizer, Grammar } from 'tokenizer-1dv610'


let wordAndDotGrammar = new Grammar()

wordAndDotGrammar.addGrammar({'regex':/^[\A-Za-z|åäöÅÄÖ]+/g,'type':'word'}) wordAndDotGrammar.addGrammar({'regex':/^./g,'type':'dot', 'sentenceEnding':'normal sentence'}) wordAndDotGrammar.addGrammar({'regex':/^?/g,'type':'question mark', 'sentenceEnding':'question'}) wordAndDotGrammar.addGrammar({'regex':/^!/g,'type':'exclamation mark', 'sentenceEnding':'announcement'})

let tokenizer = new Tokenizer('Hello this is a sentence.',wordAndDotGrammar)

To get the current token use:

tokenizer.getCurrentToken()

To step to the next token use:

tokenizer.next()

To step to the previous token use:

tokenizer.previous()

To change the string use:

tokenizer.setString('new string')

Example:

'Hello this is a sentence.' if you try it out with the above code sentence:

console.log(tokenizer.getCurrentToken())

output: "Hello"

tokenizer.next() tokenizer.next()

console.log(tokenizer.getCurrentToken())

output: "is"

About

A tokenizer for the course 1dv610

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published