Java Evaluator API for Predictive Model Markup Language (PMML).
JPMML-Evaluator is de facto the reference implementation of the PMML specification versions 3.0, 3.1, 3.2, 4.0, 4.1, 4.2 and 4.3 for the Java platform:
- Pre-processing of input fields according to the DataDictionary and MiningSchema elements:
- Complete data type system.
- Complete operational type system.
- Treatment of outlier, missing and/or invalid values.
- Model evaluation:
- Post-processing of target fields according to the Targets element:
- Rescaling and/or casting regression results.
- Replacing a missing regression result with the default value.
- Replacing a missing classification result with the map of prior probabilities.
- Calculation of auxiliary output fields according to the Output element:
- Over 20 different result feature types.
- Model verification according to the ModelVerification element.
- Vendor extensions:
- Java-backed model, expression and predicate types - integrate any 3rd party Java library into PMML data flow.
- MathML prediction reports.
For more information please see the features.md file.
JPMML-Evaluator is interoperable with most popular statistics and data mining software:
- R and Rattle:
- JPMML-R library and
r2pmml
package. pmml
andpmmlTransformations
packages.
- JPMML-R library and
- Python and Scikit-Learn:
- JPMML-SkLearn library and
sklearn2pmml
package.
- JPMML-SkLearn library and
- Apache Spark:
- JPMML-SparkML library.
mllib.pmml.PMMLExportable
interface.
- XGBoost:
- JPMML-XGBoost library.
- LightGBM:
- JPMML-LightGBM library.
- TensorFlow:
- JPMML-TensorFlow library.
- KNIME
- RapidMiner
- SAS
- SPSS
JPMML-Evaluator is fast and memory efficient. It can deliver one million scorings per second already on a desktop computer.
- Java 1.8 or newer.
JPMML-Evaluator library JAR files (together with accompanying Java source and Javadocs JAR files) are released via Maven Central Repository.
The current version is 1.4.1 (27 March, 2018).
<dependency>
<groupId>org.jpmml</groupId>
<artifactId>pmml-evaluator</artifactId>
<version>1.4.1</version>
</dependency>
<dependency>
<groupId>org.jpmml</groupId>
<artifactId>pmml-evaluator-extension</artifactId>
<version>1.4.1</version>
</dependency>
JPMML-Evaluator depends on the JPMML-Model library for PMML class model.
Loading a PMML schema version 3.X or 4.X document into an org.dmg.pmml.PMML
instance:
PMML pmml;
try(InputStream is = ...){
pmml = org.jpmml.model.PMMLUtil.unmarshal(is);
}
The newly loaded PMML
instance should tailored by applying appropriate org.dmg.pmml.Visitor
implementation classes to it:
org.jpmml.model.visitors.LocatorTransformer
. Transforms SAX Locator information to Java serializable representation. Recommended for development and testing environments.org.jpmml.model.visitors.LocatorNullifier
. Removes SAX Locator information. Recommended for production environments.org.jpmml.model.visitors.<Type>Interner
. Replaces all occurrences of the same PMML attribute value with the singleton attribute value.org.jpmml.evaluator.visitors.<Element>Interner
. Replaces all occurrences of the same PMML element with the singleton element.org.jpmml.evaluator.visitors.<Element>Optimizer
. Pre-parses PMML element.
To facilitate their use, visitor classes have been grouped into visitor battery classes:
org.jpmml.model.visitors.AttributeInternerBattery
org.jpmml.evaluator.visitors.ElementInternerBattery
org.jpmml.evaluator.visitors.ElementOptimizerBattery
Building and applying a custom visitor battery to reduce the memory consumption of a PMML
instance in production environment:
VisitorBattery visitorBattery = new VisitorBattery();
// Getting rid of SAX Locator information
visitorBattery.add(org.jpmml.model.visitors.LocatorNullifier.class);
// Getting rid of duplicate PMML attribute values and PMML elements
visitorBattery.addAll(new org.jpmml.model.visitors.AttributeInternerBattery());
visitorBattery.addAll(new org.jpmml.evaluator.visitors.ElementInternerBattery());
visitorBattery.applyTo(pmml);
The PMML standard defines large number of model types.
The evaluation logic for each model type is encapsulated into a corresponding org.jpmml.evaluator.ModelEvaluator
subclass.
Even though ModelEvaluator
subclasses can be created directly, the recommended approach is to follow the factory design pattern as implemented by the org.jpmml.evaluator.ModelEvaluatorFactory
factory class.
Obtaining and configuring a ModelEvaluatorFactory
instance:
ModelEvaluatorFactory modelEvaluatorFactory = ModelEvaluatorFactory.newInstance();
// Activate the generation of MathML prediction reports
ValueFactoryFactory valueFactoryFactory = ReportingValueFactoryFactory.newInstance();
modelEvaluatorFactory.setValueFactoryFactory(valueFactoryFactory);
The model evaluator factory selects the first model from the PMML
instance, and creates and configures a corresponding ModelEvaluator
instance.
However, in order to promote loose coupling, it is advisable to cast the result to a much simplified org.jpmml.evaluator.Evaluator
instance.
Obtaining an Evaluator
instance for the PMML
instance:
Evaluator evaluator = (Evaluator)modelEvaluatorFactory.newModelEvaluator(pmml);
Model evaluator classes follow functional programming principles and are completely thread safe.
Model evaluator instances are fairly lightweight, which makes them cheap to create and destroy.
Nevertheless, long-running applications should maintain a one-to-one mapping between PMML
and Evaluator
instances for better performance.
The model evaluator can be queried for the list of input (ie. independent), target (ie. primary dependent) and output (ie. secondary dependent) field definitions, which provide information about field name, data type, operational type, value domain etc. information.
Querying and analyzing input fields:
List<InputField> inputFields = evaluator.getInputFields();
for(InputField inputField : inputFields){
org.dmg.pmml.DataField pmmlDataField = (org.dmg.pmml.DataField)inputField.getField();
org.dmg.pmml.MiningField pmmlMiningField = inputField.getMiningField();
org.dmg.pmml.DataType dataType = inputField.getDataType();
org.dmg.pmml.OpType opType = inputField.getOpType();
switch(opType){
case CONTINUOUS:
RangeSet<Double> validArgumentRanges = inputField.getContinuousDomain();
break;
case CATEGORICAL:
case ORDINAL:
List<?> validArgumentValues = inputField.getDiscreteDomain();
break;
default:
break;
}
}
Querying and analyzing target fields:
List<TargetField> targetFields = evaluator.getTargetFields();
for(TargetField targetField : targetFields){
org.dmg.pmml.DataField pmmlDataField = targetField.getDataField();
org.dmg.pmml.MiningField pmmlMiningField = targetField.getMiningField(); // Could be null
org.dmg.pmml.Target pmmlTarget = targetField.getTarget(); // Could be null
org.dmg.pmml.DataType dataType = targetField.getDataType();
org.dmg.pmml.OpType opType = targetField.getOpType();
switch(opType){
case CONTINUOUS:
break;
case CATEGORICAL:
case ORDINAL:
List<String> categories = targetField.getCategories();
for(String category : categories){
Object validResultValue = TypeUtil.parse(dataType, category);
}
break;
default:
break;
}
}
Querying and analyzing output fields:
List<OutputField> outputFields = evaluator.getOutputFields();
for(OutputField outputField : outputFields){
org.dmg.pmml.OutputField pmmlOutputField = outputField.getOutputField();
org.dmg.pmml.DataType dataType = outputField.getDataType(); // Could be null
org.dmg.pmml.OpType opType = outputField.getOpType(); // Could be null
boolean finalResult = outputField.isFinalResult();
if(!finalResult){
continue;
}
}
A model may contain verification data, which is a small but representative set of data records (inputs plus expected outputs) for ensuring that the model evaluator is behaving correctly in this deployment configuration (JPMML-Evaluator version, Java/JVM version and vendor etc. variables). The model evaluator should be verified once, before putting it into actual use.
Performing the self-check:
evaluator.verify();
During scoring, the application code should iterate over data records (eg. rows of a table), and apply the following encode-evaluate-decode sequence of operations to each one of them.
The processing of the first data record will be significantly slower than the processing of all subsequent data records, because the model evaluator needs to lookup, validate and pre-parse model content. If the model contains verification data, then this warm-up cost is borne during the self-check.
Preparing the argument map:
Map<FieldName, FieldValue> arguments = new LinkedHashMap<>();
List<InputField> inputFields = evaluator.getInputFields();
for(InputField inputField : inputFields){
FieldName inputFieldName = inputField.getName();
// The raw (ie. user-supplied) value could be any Java primitive value
Object rawValue = ...;
// The raw value is passed through: 1) outlier treatment, 2) missing value treatment, 3) invalid value treatment and 4) type conversion
FieldValue inputFieldValue = inputField.prepare(rawValue);
arguments.put(inputFieldName, inputFieldValue);
}
Performing the evaluation:
Map<FieldName, ?> results = evaluator.evaluate(arguments);
Extracting primary results from the result map:
List<TargetField> targetFields = evaluator.getTargetFields();
for(TargetField targetField : targetFields){
FieldName targetFieldName = targetField.getName();
Object targetFieldValue = results.get(targetFieldName);
}
The target value is either a Java primitive value (as a wrapper object) or an instance of org.jpmml.evaluator.Computable
:
if(targetFieldValue instanceof Computable){
Computable computable = (Computable)targetFieldValue;
Object unboxedTargetFieldValue = computable.getResult();
}
The target value may implement interfaces that descend from interface org.jpmml.evaluator.ResultFeature
:
// Test for "entityId" result feature
if(targetFieldValue instanceof HasEntityId){
HasEntityId hasEntityId = (HasEntityId)targetFieldValue;
HasEntityRegistry<?> hasEntityRegistry = (HasEntityRegistry<?>)evaluator;
BiMap<String, ? extends Entity> entities = hasEntityRegistry.getEntityRegistry();
Entity winner = entities.get(hasEntityId.getEntityId());
// Test for "probability" result feature
if(targetFieldValue instanceof HasProbability){
HasProbability hasProbability = (HasProbability)targetFieldValue;
Double winnerProbability = hasProbability.getProbability(winner.getId());
}
}
Extracting secondary results from the result map:
List<OutputField> outputFields = evaluator.getOutputFields();
for(OutputField outputField : outputFields){
FieldName outputFieldName = outputField.getName();
Object outputFieldValue = results.get(outputFieldName);
}
The output value is always a Java primitive value (as a wrapper object).
Module pmml-evaluator-example
exemplifies the use of the JPMML-Evaluator library.
This module can be built using Apache Maven:
mvn clean install
The resulting uber-JAR file target/example-1.4-SNAPSHOT.jar
contains the following command-line applications:
org.jpmml.evaluator.EvaluationExample
(source).org.jpmml.evaluator.TestingExample
(source).org.jpmml.evaluator.EnhancementExample
.
Evaluating model model.pmml
with data records from input.csv
. The predictions are stored to output.csv
:
java -cp target/example-1.4-SNAPSHOT.jar org.jpmml.evaluator.EvaluationExample --model model.pmml --input input.csv --output output.csv
Evaluating model model.pmml
with data records from input.csv
. The predictions are verified against data records from expected-output.csv
:
java -cp target/example-1.4-SNAPSHOT.jar org.jpmml.evaluator.TestingExample --model model.pmml --input input.csv --expected-output expected-output.csv
Enhancing model model.pmml
with verification data records from input_expected-output.csv
:
java -cp target/example-1.4-SNAPSHOT.jar org.jpmml.evaluator.EnhancementExample --model model.pmml --verification input_expected_output.csv
Getting help:
java -cp target/example-1.4-SNAPSHOT.jar <application class name> --help
Limited public support is available via the JPMML mailing list.
JPMML-Evaluator is licensed under the GNU Affero General Public License (AGPL) version 3.0. Other licenses are available on request.
Please contact [email protected]