Skip to content
forked from google/temporian

Temporian is an open-source Python library for preprocessing โšก and feature engineering ๐Ÿ›  temporal data ๐Ÿ“ˆ for machine learning applications ๐Ÿค–

License

Notifications You must be signed in to change notification settings

bbcho/temporian

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Temporian logo

tests formatting

Temporian is a library to pre-process temporal signals before their use as input features with off-the-shelf tabular machine learning libraries (e.g., TensorFlow Decision Forests, scikit-learn).

Usage Example

A minimal end-to-end run looks as follows:

import temporian as tp

# Load the data
evset = tp.read_event_set("path/to/data.csv")
node = evset.node()

# Create Simple Moving Average feature
sma_node = tp.simple_moving_average(
    node,
    window_length=tp.day(5),
)

# Create Lag feature
lag_node = tp.lag(
    node,
    lag=tp.week(1),
)

# Glue features
output_node = tp.glue(node, sma_node)
output_node = tp.glue(output_node, lag_node)


# Execute pipeline and get results
output_evset = tp.evaluate(
    output_node,
    {
        node: evset,
    },
)

Warning: The library is still under construction. This example usage is what we are aiming to build in the short term.

Supported Features

Temporian currently supports the following features for pre-processing your temporal data:

  • Simple Moving Average: calculates the average value of each feature over a specified time window.
  • Lag: creates new features by shifting the time series data backwards in time by a specified period.
  • Arithmetic Operations: allows you to perform arithmetic operations (such as addition, subtraction, multiplication, and division) on time series data, between different events.
  • More features coming soon!

Documentation

The official documentation is available at temporian.readthedocs.io.

Environment Setup

Dependencies are managed through Poetry. To install Poetry, execute the following command:

curl -sSL https://install.python-poetry.org | python3 -

You can verify Poetry was correctly installed by executing:

poetry --version

The environment requires Python version 3.9.0 or greater to be installed. We recommend using PyEnv to install and manage multiple Python versions. To install PyEnv, head over to the tool's documentation in Github and follow the installation instructions for your operating system.

Once PyEnv is installed, you can download any Python version (e.g. 3.9.6) by running:

pyenv install 3.9.6

After both Poetry and an adequate Python version have been installed, you can proceed to install the virtual environment and the required dependencies. Navigate to the project's root directory (where the pyproject.toml file is located) and execute:

poetry install

You can also install the environment in the project's root directory by executing poetry config virtualenvs.in-project true before it.

Finally, activate the virtual environment by executing:

poetry shell

Testing

Install bazel and buildifier (in Mac we recommend installing bazelisk with brew):

brew install bazelisk

Run all tests with bazel:

bazel test //...:all

Note: You can use the Bazel test flag --test_output=streamed to see the test logs in realtime.

Benchmarking and profiling

Benchmarking and profiling of pre-configured scripts is available as follow:

Time and memory profiling

bazel run -c opt benchmark:profile_time -- [name]
bazel run -c opt benchmark:profile_memory -- [name] [-p]

where [name] is the name of one of the python scripts in benchmark/scripts, e.g. bazel run -c opt benchmark:profile_time -- basic.

-p flag displays memory over time plot instead of line-by-line memory consumption.

Time benchmarking

bazel run -c opt benchmark:benchmark_time

Example of results:

================================================================
Name                              Wall time (s)    CPU time (s)
================================================================
from_dataframe:100                   0.01601       0.01600
from_dataframe:10000                 0.03091       0.03091
from_dataframe:1000000               1.05764       1.05122
----------------------------------------------------------------
simple_moving_average:100            0.00108       0.00108
simple_moving_average:10000          0.00150       0.00150
simple_moving_average:1000000        0.00839       0.00839
----------------------------------------------------------------
select_and_glue:100                  0.00076       0.00076
select_and_glue:10000                0.00074       0.00074
select_and_glue:1000000              0.00104       0.00104
----------------------------------------------------------------
...
================================================================

Run documentation server locally

Live preview your local changes to the documentation with

mkdocs serve -f docs/mkdocs.yml

Credits

This project is a collaboration between Google and Tryolabs.

About

Temporian is an open-source Python library for preprocessing โšก and feature engineering ๐Ÿ›  temporal data ๐Ÿ“ˆ for machine learning applications ๐Ÿค–

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 88.5%
  • Starlark 7.0%
  • C++ 3.4%
  • Other 1.1%