Skip to content

Generation and evaluation of synthetic time series datasets (also, augmentations, visualizations, a collection of popular datasets) NeurIPS'24

License

Notifications You must be signed in to change notification settings

AlexanderVNikitin/tsgm

Repository files navigation

Open in Colab Pypi version unit-tests Python 3.8+ License Last Commit

arXiv codecov

Time Series Generative Modeling (TSGM)

Documentation | Tutorials

About TSGM

TSGM is an open-source framework for synthetic time series generation and augmentation.

The framework can be used for:

  • creating synthetic data, using historical data, black-box models, or a combined approach,
  • augmenting time series data,
  • evaluating synthetic data with respect to consistency, privacy, downstream performance, and more.

Install TSGM

pip install tsgm

M1 and M2 chips:

To install tsgm on Apple M1 and M2 chips:

# Install tensorflow
conda install -c conda-forge tensorflow=2.9.1

# Install tsgm without dependencies
pip install tsgm --no-deps

# Install rest of the dependencies (separately here for clarity)
conda install tensorflow-probability scipy antropy statsmodels dtaidistance networkx optuna prettytable seaborn scikit-learn yfinance tqdm

Train your generative model

For more examples, see our tutorials.

import tsgm

# ... Define hyperparameters ...
# dataset is a tensor of shape n_samples x seq_len x feature_dim

# Zoo contains several prebuilt architectures: we choose a conditional GAN architecture
architecture = tsgm.models.architectures.zoo["cgan_base_c4_l1"](
    seq_len=seq_len, feat_dim=feature_dim,
    latent_dim=latent_dim, output_dim=0)
discriminator, generator = architecture.discriminator, architecture.generator

# Initialize GAN object with selected discriminator and generator
gan = tsgm.models.cgan.GAN(
    discriminator=discriminator, generator=generator, latent_dim=latent_dim
)
gan.compile(
    d_optimizer=keras.optimizers.Adam(learning_rate=0.0003),
    g_optimizer=keras.optimizers.Adam(learning_rate=0.0003),
    loss_fn=keras.losses.BinaryCrossentropy(from_logits=True),
)
gan.fit(dataset, epochs=N_EPOCHS)

# Generate 100 synthetic samples
result = gan.generate(100)

Getting started

We provide:

  • Documentation with a complete overview of the implemented methods,
  • Tutorials that describe practical use-cases of the framework.

💾 Datasets

Dataset API Description
UCR Dataset tsgm.utils.UCRDataManager https://www.cs.ucr.edu/%7Eeamonn/time_series_data_2018/
Mauna Loa tsgm.utils.get_mauna_loa() https://gml.noaa.gov/ccgg/trends/data.html
EEG & Eye state tsgm.utils.get_eeg() https://archive.ics.uci.edu/ml/datasets/EEG+Eye+State
Power consumption dataset tsgm.utils.get_power_consumption() https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumption
Stock data tsgm.utils.get_stock_data(ticker_name) Gets historical stock data from YFinance
COVID-19 over the US tsgm.utils.get_covid_19() Covid-19 distribution over the US
Energy Data (UCI) tsgm.utils.get_energy_data() https://archive.ics.uci.edu/ml/datasets/Appliances+energy+prediction
MNIST as time series tsgm.utils.get_mnist_data() https://en.wikipedia.org/wiki/MNIST_database
Samples from GPs tsgm.utils.get_gp_samples_data() https://en.wikipedia.org/wiki/Gaussian_process
Physionet 2012 tsgm.utils.get_physionet2012() https://archive.physionet.org/pn3/challenge/2012/

TSGM provides API for convenient use of many time-series datasets (currently more than 20 datasets). The comprehensive list of the datasets in the documentation

Augmentations

TSGM provides a number of time series augmentations.

Augmentation Class in TSGM Reference
Gaussian Noise / Jittering tsgm.augmentations.GaussianNoise -
Slice-And-Shuffle tsgm.augmentations.SliceAndShuffle -
Shuffle Features tsgm.augmentations.Shuffle -
Magnitude Warping tsgm.augmentations.MagnitudeWarping Data Augmentation of Wearable Sensor Data for Parkinson’s Disease Monitoring using Convolutional Neural Networks
Window Warping tsgm.augmentations.WindowWarping Data Augmentation for Time Series Classification using Convolutional Neural Networks
DTW Barycentric Averaging tsgm.augmentations.DTWBarycentricAveraging A global averaging method for dynamic time warping, with applications to clustering.

Contributing

We appreciate all contributions. To learn more, please check CONTRIBUTING.md.

For contributors

git clone github.com/AlexanderVNikitin/tsgm
cd tsgm
pip install -e .

Run tests:

python -m pytest

To check static typing:

mypy

CLI

We provide two CLIs for convenient synthetic data generation:

  • tsgm-gd generates data by a stored sample,
  • tsgm-eval evaluates the generated time series.

Use tsgm-gd --help or tsgm-eval --help for documentation.

Citing

If you find this repo useful, please consider citing our paper:

@article{
  nikitin2023tsgm,
  title={TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series},
  author={Nikitin, Alexander and Iannucci, Letizia and Kaski, Samuel},
  journal={arXiv preprint arXiv:2305.11567},
  year={2023}
}

License

Apache License 2.0