A fast and flexible Bayesian Naive Bayes implementation for the JVM written in Kotlin.
- Fully supports the online learning paradigm, in which data, and even new features, are added as they become available.
- Reasonably fast and memory efficient. We've trained a document classifier with tens of thousands of classes on hundreds of thousands of documents, and ironed out most of the hot-spots.
- Naturally works with few samples, by integrating out the uncertainty on estimated parameters.
- Models and data structures are immutable such that they are concurrency friendly.
- Efficient serialization and deserialization using protobuf.
- Missing and unknown features at prediction time are properly handled.
- Minimal dependencies.
Get the latest artifact from maven central
//Java 9
Model model = new Model().batchAdd(List.of(new Update( //Models are immutable
new Inputs( // Supports multiple feature types
Map.of( //Text features
"subject", "Attention, is it true?", //features are named.
"body", "Good day dear beneficiary. This is Secretary to president of Benin republic is writing this email ..." // multiple features of the same type have different names
),
Map.of( //Categorical features
"sender", "[email protected]"
),
Map.of( //Gaussian features
"n_words", 482.
)
),
"spam" // the outcome, in this case spam.
)));
Map<String, Double> predictions = model.predict(new Inputs(/*...*/));// e.g. {"spam": 0.624, "ham": 0.376}
- Kotlin - Language
- Maven - Dependency Management
- Protocol Buffers - Serialization
We use SemVer for versioning.
- Create a Sonatype account
- The created username and password will be referred to as
<sonatype_user>
and<sonatype_pwd
, respectively.
- The created username and password will be referred to as
- Create a Sonatype Jira ticket of type
Publishing Support
requesting access tocom.tradeshift.blayze
. It must be approved by an existing user with write access. It can take a couple of days before access is granted. - Create a PR that updates the version in
pom.xml
along with the code changes. Merge it tomaster
orv4
once it is approved. - Generate a gpg key:
gpg --gen-key
- List gpg keys:
gpg --list-keys
- Extract the key id of the previously generated gpg key. It will be referred to as
<gpg_key_id>
from now - Encrypt
<sonatype_pwd>
usingmvn --encrypt-password
. The encrypted value is referred to as<sonatype_pwd_enc>
- Create a new server in
~/.m2/settings.xml
<settings>
<servers>
<server>
<id>ossrh-blayze</id>
<username><sonatype_user></username>
<password><sonatype_pwd_enc></password>
</server>
</servers>
</settings>
- Run
mvn clean deploy -P release -Dgpg.keyname=<gpg_key_id>
- For further details, check Sonatype documentation
We publish security updates for major version 4.x.x
(branch v4
) as well as 6.x.x
(branch master
)