Skip to content

Latest commit

 

History

History
97 lines (74 loc) · 6.57 KB

features.md

File metadata and controls

97 lines (74 loc) · 6.57 KB

Features

Overview

The JPMML-Evaluator library is de facto the reference implementation of the PMML specification for the Java platform.

The primary objective is to provide full compliance with 4.X versions of the PMML specification (released since 2009). The secondary objective is to provide maximum working compliance with 3.X versions of the PMML specification (released between 2004 and 2007). It means that some PMML features whose scope is limited to 3.X versions (eg. removed or deprecated in 4.X versions) may not be supported.

General structure

The JPMML-Evaluator library is hardwired to perform thorough "sanity" checking. Model evaluator classes will throw an exception when an invalid and/or unsupported PMML feature is encountered.

Data flow
  • Pre-processing of active fields (aka independent variables) according to the DataDictionary and MiningSchema elements:
    • Strict data type system:
      • Except for dateDaysSince[0] and dateTimeSecondsSince[0] data types.
    • Strict operational type system.
    • Treatment of outlier, missing and/or invalid values.
  • Model evaluation.
  • Post-processing of target fields (aka dependent variables) according to the Targets element:
    • Rescaling and/or casting regression results.
    • Replacing a missing regression result with the default value.
    • Replacing a missing classification result with the map of prior probabilities.
  • Calculation of auxiliary output fields according to the Output element:
    • Over 20 different result feature types:
      • Except for the standardError result feature.
Data manipulation

Model types

Supported model types:

Not yet supported model types:

Known limitations

  • Model composition. Model composition specifies a mechanism for embedding defeatured regression and decision tree models (represented by the Regression and DecisionTree elements, respectively) into other models. Model composition was deprecated in PMML schema version 4.1.
  • The ClusteringModel/CenterFields element. This element was removed in PMML schema version 3.2. PMML producers should move the list of DerivedField child elements to the ClusteringModel/LocalTransformations element, and reference them using a list of ClusteringField elements instead.
  • The MiningModel/Segmentation/LocalTransformations element. This element was deprecated in PMML schema version 4.1. PMML producers should move the list of DerivedField child elements to the MiningModel/LocalTransformations element instead.
  • The TableLocator element. The TableLocator element specifies a mechanism for incorporating data from external data sources (eg. CSV files, databases). The TableLocator element is simply a placeholder in PMML schema version 4.3.

Inspection API

The class model object can be inspected for unsupported PMML elements and attributes using a visitor class org.jpmml.evaluator.visitors.UnsupportedMarkupInspector (source). This visitor collects all unsupported markup as instances of org.jpmml.manager.UnsupportedMarkupException.

The class model object is safe for evaluation using the JPMML-Evaluator library if the collection of exceptions is empty:

public boolean isFullySupported(PMML pmml){
  UnsupportedMarkupInspector inspector = new UnsupportedMarkupInspector();

  // Traverse the specified class model object
  pmml.accept(inspector);

  List<UnsupportedMarkupException> exceptions = inspector.getExceptions();
  if(exceptions.isEmpty()){
    return true;
  }

  return false;
}

The visitor class traverses the class model object completely. In contrast, actual model evaluator classes traverse the class model object more or less partially, whereas every "evaluation path" is a function of the specified input data record. It follows that the collection of exceptions represents the worst case scenario. The evaluation using the JPMML-Evaluator library may succed even if this collection is not empty.