Skip to content
This repository has been archived by the owner on Sep 11, 2023. It is now read-only.

Object Storage

Frank Noe edited this page Apr 16, 2016 · 10 revisions

Object storage

What? Implement a method to easily save/load PyEMMA high-level objects to/from disk.

Why? To facilitate interactive or scripted analysis in with large datasets. Without adaptations, the behavior of the pickle module is not well-defined because it's a priori not defined which attributes are data belonging to an object, and which are just links to other resources.

Estimators

  • Estimation parameters get_params() and set_params(): Input parameters used to construct the estimator object.
  • Estimation state get_state() and set_state(): State variables set by estimation. This includes estimates that connect data and model, such as convergence information.

Models

Can be mixed in to estimator or standalone.

  • Model parameters get_model_params() and set_model_params(): Estimated or set parameters of the model.

Transformers, KineticModels etc.

All of these are subclass of Models and inherit the model I/O properties.

API

Estimator/Model save and load:

    from pyemma import msm

    # save parametrized estimator
    mle = msm.estimate_markov_model([1, 0, 0, 0, 1, 1, 0], 1)
    mle.save('msm_mle.pyemma')

    # load parametrized estimator
    mle_recovered = pyemma.load('msm_mle.pyemma')
    mle_recovered.cktest(2)  # this works if estimation data was stored too

    # save just the model
    mle.model.save('msm.pyemma')

    # load just the model
    msm_recovered = pyemma.load('msm.pyemma')
    print msm_recovered.stationary_distribution  # this works with model parameters alone

Implementation

Pickle

We can implement object storage with the pickle or cpickle modules.

  • __getstate__() and __setstate__() need to be overloaded in order to save/load the desired content of Estimators, Models etc.
  • Does pickle have efficient protocols (compressed and fast)? Compare to np.savez_compressed

Implementation with numpy

We can implement object storage with np.savez_compressed and np.load

Implementation with json