Feathr lets you:
- define features based on raw data sources, including time-series data, using simple HOCON configuration
- get those features by their names during model training and model inferencing, using simple APIs
- share features across your team and company
Feathr automatically computes your feature values and joins them to your training data, using point-in-time-correct semantics to avoid data leakage, and supports materializing and deploying your features for use online in production.
Follow the quick-start-guide to try it out. For more details, read our documentation.
In features.conf:
anchors: { // Feature anchors
trip_features: { // A feature anchor
source: nycTaxiBatchSource
key: DOLocationID
features: { // Feature names in this anchor
f_is_long_trip: "trip_distance > 30" // A feature by an expression
f_day_of_week: "dayofweek(datetime)" // A feature with built-in function
}
}
}
With CLI tool: feathr deploy
In feature-join.conf:
// Request dataset, used to join with features
observationPath: "abfss://[email protected]/demo_input/"
// Requested features to be joined
features: [
{
// features defined in your features.conf
featureList: [f_is_long_trip, f_day_of_week]
}
]
// The output become the training dataset with features joined
outputPath: "abfss://[email protected]/demo/demo_output/"
In my_offline_training.py:
# Prepare training data by joining features to the input (observation) data.
# feature-join.conf and features.conf are detected and used automatically.
from feathr import FeathrClient
client = FeathrClient()
result = client.join_offline_features()
In my_online_model.py:
from feathr import FeathrClient
client = FeathrClient()
# Get features for a locationId (key)
client.online_get_features(feature_table = "agg_features",
key = "265",
feature_names = ['f_location_avg_fare', 'f_location_max_fare'])
# Batch get for multiple locationIds (keys)
client.online_batch_get_features(feature_table = "agg_features",
key = ["239", "265"],
feature_names = ['f_location_avg_fare', 'f_location_max_fare'])
anchors: {
agg_features: { // A feature anchor (with aggregation)
source: nyc_taxi_batch_source // Features data source
features: {
f_location_avg_fare: { // A feature with window aggregation
aggregation: AVG // Aggregation function
def: "cast_float(fare_amount)" // Aggregation expression
window: 3d // Over a 3-day window
}
}
key: LocationID // Query/join key of the feature(group)
}
}
sources: { // Named data sources
nyc_taxi_batch_source: { // A data source
location: { path: "abfss://[email protected]/demo_data/" }
timeWindowParameters: { // Time information of the data source
timestampColumn: "dropoff_datetime"
timestampColumnFormat: "yyyy-MM-dd HH:mm:ss"
}
}
}
// Features that depend on other features instead of external raw data sources
derivations: {
f_trip_time_distance: { // Name of the derived feature
definition: "f_trip_distance * f_trip_time_duration"
type: NUMERIC
}
}
Feathr has native integration with Azure and other cloud services, and here's the high-level architecture to help you get started.
Public Preview
release doesn't guarantee API stability and may introduce API changes.
- Private Preview release
- Public Preview release
- Alpha version release
- Support streaming and online transformation
- Support feature versioning
- Support more data sources
Build for the community and build by the community. Check out community guidelines.
Join our slack for questions and discussions.