Skip to content

Commit

Permalink
SDR Classifier & Predictor: C++ Documentation.
Browse files Browse the repository at this point in the history
* Docs cleanup
* Example usages
* Unit tests for example usages
* Minor code cleanups, non-functional changes.
  • Loading branch information
ctrl-z-9000-times committed Apr 17, 2019
1 parent df98605 commit 66b6cfc
Show file tree
Hide file tree
Showing 4 changed files with 136 additions and 43 deletions.
3 changes: 2 additions & 1 deletion src/nupic/algorithms/SDRClassifier.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ std::vector<Real> Classifier::calculateError_(
auto likelihoods = infer(pattern);

// Compute target likelihoods
vector<Real> targetDistribution(numCategories_ + 1, 0.0);
PDF targetDistribution(numCategories_ + 1, 0.0);
for( size_t i = 0; i < categoryIdxList.size(); i++ ) {
targetDistribution[categoryIdxList[i]] = 1.0 / categoryIdxList.size();
}
Expand Down Expand Up @@ -148,6 +148,7 @@ void nupic::algorithms::sdr_classifier::softmax(PDF::iterator begin, PDF::iterat

/******************************************************************************/


Predictor::Predictor(const vector<UInt> &steps, Real alpha)
{ initialize(steps, alpha); }

Expand Down
108 changes: 73 additions & 35 deletions src/nupic/algorithms/SDRClassifier.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,49 +36,59 @@
#include <nupic/types/Sdr.hpp>

#include <nupic/types/Serializable.hpp>
#include <cereal/types/vector.hpp>
#include <cereal/types/deque.hpp>
#include <cereal/types/map.hpp>

namespace nupic {
namespace algorithms {
namespace sdr_classifier {


/**
* PDF - Probability Distribution Function, distribution of likelihood of values
* for each category.
* PDF: Probability Distribution Function. Each index in this vector is a
* category label, and each value is the likelihood of the that category.
*
* See also: https://en.wikipedia.org/wiki/Probability_distribution
*/
using PDF = std::vector<Real>;

/**
* Returns the class with the single greatest probablility.
* Returns the category with the greatest probablility.
*/
UInt argmax( const PDF & data );


/**
* The SDR Classifier takes the form of a single layer classification network
* that takes SDRs as input and outputs a predicted distribution of classes.
*
* The SDR Classifier accepts an SDR input pattern from the level below (the
* “pattern”) and information from the sensor and encoders (the
* “classification”) describing the true (target) input.
* The SDR Classifier takes the form of a single layer classification network.
* It accepts SDRs as input and outputs a predicted distribution of categories.
*
* The SDR classifier maps input patterns to class labels. There are as many
* output units as the maximum class label or bucket (in the case of scalar
* encoders). The output is a probabilistic distribution over all class labels.
* Categories are labeled using unsigned integers. Other data types must be
* enumerated or transformed into postitive integers. There are as many output
* units as the maximum category label.
*
* During inference, the output is calculated by first doing a weighted
* summation of all the inputs, and then perform a softmax nonlinear function to
* get the predicted distribution of class labels
* get the predicted distribution of category labels.
*
* During learning, the connection weights between input units and output units
* are adjusted to maximize the likelihood of the model
* are adjusted to maximize the likelihood of the model.
*
* Example Usage:
*
* // Make a random SDR and associate it with the category B.
* SDR inputData({ 1000 });
* inputData.randomize( 0.02 );
* enum Category { A, B, C, D };
* Classifier clsr;
* clsr.learn( inputData, { Category::B } );
* argmax( clsr.infer( inputData ) ) -> Category::B
*
* Example Usage: TODO
* // Estimate a scalar value. The Classifier only accepts categories, so
* // put real valued inputs into bins (AKA buckets) by subtracting the
* // minimum value and dividing by a resolution.
* double scalar = 567.8;
* double minimum = 500;
* double resolution = 10;
* clsr.learn( inputData, { (scalar - minimum) / resolution } );
* argmax( clsr.infer( inputData ) ) * resolution + minimum -> 560
*
* References:
* - Alex Graves. Supervised Sequence Labeling with Recurrent Neural Networks,
Expand All @@ -100,16 +110,16 @@ class Classifier : public Serializable
Classifier(Real alpha = 0.001f );

/**
* Constructor for use when deserializing.
* For use when deserializing.
*/
Classifier() {}
void initialize(Real alpha);

/**
* Compute the likelihoods for each category / bucket.
*
* @param pattern: The active input bit SDR.
* @returns: The Probablility Density Function (PDF) of the categories.
* @param pattern: The SDR containing the active input bits.
* @returns: The Probablility Distribution Function (PDF) of the categories.
* This is indexed by the category label.
*/
PDF infer(const sdr::SDR & pattern);

Expand Down Expand Up @@ -160,13 +170,13 @@ void softmax(PDF::iterator begin, PDF::iterator end);

/******************************************************************************/


/**
* The key is the step, for predicting multiple time steps into the future.
* The value is a PDF (probability distribution function, list of probabilities
* of outcomes) of the result being in each bucket.
* The value is a PDF (probability distribution function, of the result being in
* each bucket or category).
*/
using Predictions = std::map<Int, PDF>;

using Predictions = std::map<UInt, PDF>;

/**
* The Predictor class does N-Step ahead predictions.
Expand All @@ -176,41 +186,70 @@ using Predictions = std::map<Int, PDF>;
*
* Compatibility Note: This class is the replacement for the old SDRClassifier.
* It no longer provides estimates of the actual value.
*
* Example Usage:
*
* // Predict 1 and 2 time steps into the future.
*
* // First, make a sequence of 4 random SDRs.
* // Each SDR has 1000 bits and 2% sparsity.
* vector<SDR> sequence( 4, { 1000 } );
* for( SDR & inputData : sequence )
* inputData.randomize( 0.02 );
*
* // Second, make category labels for the sequence.
* vector<UInt> labels = { 4, 5, 6, 7 };
*
* // Third, make a Predictor and train it.
* Predictor pred( vector<UInt>{ 1, 2 } );
* pred.learn( 0, sequence[0], { labels[0] } );
* pred.learn( 1, sequence[1], { labels[1] } );
* pred.learn( 2, sequence[2], { labels[2] } );
* pred.learn( 3, sequence[3], { labels[3] } );
*
* // Fourth, give the predictor partial information, and make predictions
* // about the future.
* pred.reset();
* Predictions A = pred.infer( 0, sequence[0] );
* argmax( A[1] ) -> labels[1]
* argmax( A[2] ) -> labels[2]
*
* Predictions B = pred.infer( 1, sequence[1] );
* argmax( B[1] ) -> labels[2]
* argmax( B[2] ) -> labels[3]
*/
class Predictor : public Serializable
{
public:
/**
* Constructor.
*
* @param steps - The different number of steps to learn and predict.
* @param steps - The number of steps into the future to learn and predict.
* @param alpha - The alpha used to adapt the weight matrix during learning. A
* larger alpha results in faster adaptation to the data.
*/
Predictor(const std::vector<UInt> &steps, Real alpha);
Predictor(const std::vector<UInt> &steps, Real alpha = 0.001f );

/**
* Constructor for use when deserializing.
*/
Predictor() {}
void initialize(const std::vector<UInt> &steps, Real alpha);
void initialize(const std::vector<UInt> &steps, Real alpha = 0.001f );

/**
* For use with time series datasets.
*/
void reset();

/**
* Compute the likelihoods for each bucket.
* Compute the likelihoods.
*
* @param recordNum: An incrementing integer for each record. Gaps in
* numbers correspond to missing records.
*
* @param pattern: The active input SDR.
*
* @returns: A mapping from prediction step to a vector of likelihoods where
* the value at an index corresponds to the bucket with the same
* index.
* @returns: A mapping from prediction step to PDF.
*/
Predictions infer(UInt recordNum, const sdr::SDR &pattern);

Expand Down Expand Up @@ -243,8 +282,7 @@ class Predictor : public Serializable
// The list of prediction steps to learn and infer.
std::vector<UInt> steps_;

// Stores the input pattern history, starting with the previous input
// and containing _maxSteps total input patterns.
// Stores the input pattern history, starting with the previous input.
std::deque<sdr::SDR> patternHistory_;
std::deque<UInt> recordNumHistory_;
void updateHistory_(UInt recordNum, const sdr::SDR & pattern);
Expand Down
4 changes: 2 additions & 2 deletions src/nupic/types/Serializable.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -217,8 +217,8 @@ class Serializable {
// Remove the following two lines.

// These must be implemented by the subclass.
virtual void save(std::ostream &stream) const { saveToStream_ar(stream); };
virtual void load(std::istream &stream) { loadFromStream_ar(stream); };
virtual void save(std::ostream &stream) const { };
virtual void load(std::istream &stream) { };


virtual inline void saveToFile_ar(std::string filePath, SerializableFormat fmt=SerializableFormat::BINARY) const {
Expand Down
64 changes: 59 additions & 5 deletions src/test/unit/algorithms/SDRClassifierTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,60 @@ using namespace nupic::algorithms::sdr_classifier;
namespace testing {


TEST(SDRClassifierTest, ExampleUsageClassifier)
{
// Make a random SDR and associate it with the category B.
SDR inputData({ 1000u });
inputData.randomize( 0.02f );
enum Category { A, B, C, D };
Classifier clsr;
clsr.learn( inputData, { Category::B } );
ASSERT_EQ( argmax( clsr.infer( inputData ) ), Category::B );

// Estimate a scalar value. The Classifier only accepts categories, so
// put real valued inputs into bins (AKA buckets) by subtracting the
// minimum value and dividing by a resolution.
double scalar = 567.8f;
double minimum = 500.0f;
double resolution = 10.0f;
clsr.learn( inputData, { (UInt)((scalar - minimum) / resolution) } );
ASSERT_EQ( argmax( clsr.infer( inputData ) ) * resolution + minimum, 560.0f );
}


TEST(SDRClassifierTest, ExampleUsagePredictor)
{
// Predict 1 and 2 time steps into the future.

// First, make a sequence of 4 random SDRs.
// Each SDR has 1000 bits and 2% sparsity.
vector<SDR> sequence( 4u, vector<UInt>{ 1000u } );
for( SDR & inputData : sequence ) {
inputData.randomize( 0.02f );
}
// Second, make category labels for the sequence.
vector<UInt> labels = { 4, 5, 6, 7 };

// Third, make a Predictor and train it.
Predictor pred( vector<UInt>{ 1, 2 } );
pred.learn( 0, sequence[0], { labels[0] } );
pred.learn( 1, sequence[1], { labels[1] } );
pred.learn( 2, sequence[2], { labels[2] } );
pred.learn( 3, sequence[3], { labels[3] } );

// Fourth, give the predictor partial information, and make predictions
// about the future.
pred.reset();
Predictions A = pred.infer( 0, sequence[0] );
ASSERT_EQ( argmax( A[1] ), labels[1] );
ASSERT_EQ( argmax( A[2] ), labels[2] );

Predictions B = pred.infer( 1, sequence[1] );
ASSERT_EQ( argmax( B[1] ), labels[2] );
ASSERT_EQ( argmax( B[2] ), labels[3] );
}


TEST(SDRClassifierTest, SingleValue) {
// Feed the same input 10 times, the corresponding probability should be
// very high
Expand Down Expand Up @@ -112,7 +166,7 @@ TEST(SDRClassifierTest, ComputeComplex) {
}


TEST(ClassifierTest, MultipleCategories) {
TEST(SDRClassifierTest, MultipleCategories) {
// Test multiple category classification with single compute calls
// This test is ported from the Python unit test
Classifier c(1.0f);
Expand Down Expand Up @@ -168,25 +222,25 @@ TEST(SDRClassifierTest, SaveLoad) {

// Save and load.
stringstream ss;
EXPECT_NO_THROW(c1.save(ss));
EXPECT_NO_THROW(c1.saveToStream_ar(ss));
Predictor c2;
EXPECT_NO_THROW(c2.load(ss));
EXPECT_NO_THROW(c2.loadFromStream_ar(ss));

// Expect identical results.
const auto c2_out = c2.infer( 0u, A );
ASSERT_EQ(c1_out, c2_out);
}


TEST(ClassifierTest, testSoftmaxOverflow) {
TEST(SDRClassifierTest, testSoftmaxOverflow) {
PDF values({ numeric_limits<Real>::max() });
softmax(values.begin(), values.end());
auto result = values[0u];
ASSERT_FALSE(std::isnan(result));
}


TEST(ClassifierTest, testSoftmax) {
TEST(SDRClassifierTest, testSoftmax) {
PDF values {0.0f, 1.0f, 1.337f, 2.018f, 1.1f, 0.5f, 0.9f};
const PDF exp {
0.045123016137150938f,
Expand Down

0 comments on commit 66b6cfc

Please sign in to comment.