Skip to content

A scikit-learn compatible Python toolbox for machine learning with time series

License

Notifications You must be signed in to change notification settings

open-data-foundation/sktime

 
 

Repository files navigation

travis appveyor azure pypi gitter binder zenodo

sktime

sktime is Python toolbox for machine learning with time series. We currently support:

  • Forecasting,
  • Time series classification,
  • Time series regression.

sktime provides dedicated time series algorithms and scikit-learn compatible tools for building, tuning, and evaluating composite models.

For deep learning methods, see our companion package: sktime-dl.


Installation

The package is available via PyPI using:

pip install sktime

The package is actively being developed and some features may not be stable yet.

Development Version

To install the development version, please see our advanced installation instructions.


Quickstart

Forecasting

import numpy as np
from sktime.datasets import load_airline
from sktime.forecasting.theta import ThetaForecaster
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.performance_metrics.forecasting import smape_loss

y = load_airline()
y_train, y_test = temporal_train_test_split(y)
fh = np.arange(1, len(y_test) + 1)  # forecasting horizon
forecaster = ThetaForecaster(sp=12)  # monthly seasonal periodicity
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
smape_loss(y_test, y_pred)
>>> 0.1722386848882188

For more, check out the forecasting tutorial.

Time Series Classification

from sktime.datasets import load_arrow_head
from sktime.classification.compose import TimeSeriesForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
classifier = TimeSeriesForestClassifier()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
accuracy_score(y_test, y_pred)
>>> 0.7924528301886793

For more, check out the time series classification tutorial.


Documentation


API Overview

sktime is a unified toolbox for machine learning with time series. Time series give rise to multiple learning tasks (e.g. forecasting and time series classification). The goal of sktime is to provide all the necessary to solve these tasks, including dedicated time series algorithms as well as tools for building, tuning and evaluating composite models.

Many of these tasks are related, and an algorithm that can solve one of them can often be re-used to help solve another one, an idea called reduction. sktime's unified interface allows to easily adapt an algorithm for one task to another.

For example, to use a regression algorithm to solve a forecasting task, we can simply write:

import numpy as np
from sktime.datasets import load_airline
from sktime.forecasting.compose import ReducedRegressionForecaster
from sklearn.ensemble import RandomForestRegressor
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.performance_metrics.forecasting import smape_loss

y = load_airline()
y_train, y_test = temporal_train_test_split(y)
fh = np.arange(1, len(y_test) + 1)  # forecasting horizon
regressor = RandomForestRegressor()
forecaster = ReducedRegressionForecaster(regressor, window_length=12)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
smape_loss(y_test, y_pred)

For more details, check out our paper.

Currently, sktime provides:

  • State-of-the-art algorithms for time series classification and regression, ported from the Java-based tsml toolkit, as well as forecasting,
  • Transformers, including single-series transformations (e.g. detrending or deseasonalization) and series-as-features transformations (e.g. feature extractors), as well as tools to compose different transformers,
  • Pipelining,
  • Tuning,
  • Ensembling, such as a fully customisable random forest for time-series classification and regression, as well as ensembling for multivariate problems,

For a list of implemented methods, see our estimator overview.

In addition, sktime includes an experimental high-level API that unifies multiple learning tasks, partially inspired by the APIs of mlr and openML.


Development Roadmap

sktime is under active development. We're looking for new contributors, all contributions are welcome!

  1. Multivariate/panel forecasting based on a modified pysf API,
  2. Unsupervised learning, including time series clustering,
  3. Time series annotation, including segmentation and outlier detection,
  4. Specialised data container for efficient handling of time series/panel data in a modelling workflow and separation of time series meta-data,
  5. Probabilistic modelling framework for time series, including survival and point process models based on an adapted skpro interface.

For more details, read this issue.


How to contribute

For former and current contributors, see our overview.


How to cite sktime

If you use sktime in a scientific publication, we would appreciate citations to the following paper:

Markus Löning, Anthony Bagnall, Sajaysurya Ganesh, Viktor Kazakov, Jason Lines, Franz Király (2019): “sktime: A Unified Interface for Machine Learning with Time Series”

Bibtex entry:

@inproceedings{sktime,
    author = {L{\"{o}}ning, Markus and Bagnall, Anthony and Ganesh, Sajaysurya and Kazakov, Viktor and Lines, Jason and Kir{\'{a}}ly, Franz J},
    booktitle = {Workshop on Systems for ML at NeurIPS 2019},
    title = {{sktime: A Unified Interface for Machine Learning with Time Series}},
    date = {2019},
}

About

A scikit-learn compatible Python toolbox for machine learning with time series

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 90.2%
  • C++ 8.6%
  • Other 1.2%