forked from jorgecasas/php-ml
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #9 from php-ai/develop
Add missing docs and create changelog
- Loading branch information
Showing
8 changed files
with
202 additions
and
21 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
CHANGELOG | ||
========= | ||
|
||
This changelog references the relevant changes done in PHP-ML library. | ||
|
||
* 0.2.0 (in plan) | ||
* feature [Dataset] - FileDataset - load dataset from files (folders as targets) | ||
* feature [Metric] - ClassificationReport - report about trained classifier | ||
|
||
* 0.1.1 (2016-07-12) | ||
* feature [Cross Validation] Stratified Random Split - equal distribution for targets in split | ||
* feature [General] Documentation - add missing pages (Pipeline, ConfusionMatrix and TfIdfTransformer) and fix links | ||
|
||
* 0.1.0 (2016-07-08) | ||
* first develop release | ||
* base tools for Machine Learning: Algorithms, Cross Validation, Preprocessing, Feature Extraction | ||
* bug [General] #7 - PHP-ML doesn't work on Mac |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
42 changes: 42 additions & 0 deletions
42
docs/machine-learning/feature-extraction/tf-idf-transformer.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# Tf-idf Transformer | ||
|
||
Tf–idf, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. | ||
|
||
### Constructor Parameters | ||
|
||
* $samples (array) - samples for fit tf-idf model | ||
|
||
``` | ||
use Phpml\FeatureExtraction\TfIdfTransformer; | ||
$samples = [ | ||
[1, 2, 4], | ||
[0, 2, 1] | ||
]; | ||
$transformer = new TfIdfTransformer($samples); | ||
``` | ||
|
||
### Transformation | ||
|
||
To transform a collection of text samples use `transform` method. Example: | ||
|
||
``` | ||
use Phpml\FeatureExtraction\TfIdfTransformer; | ||
$samples = [ | ||
[0 => 1, 1 => 1, 2 => 2, 3 => 1, 4 => 0, 5 => 0], | ||
[0 => 1, 1 => 1, 2 => 0, 3 => 0, 4 => 2, 5 => 3], | ||
]; | ||
$transformer = new TfIdfTransformer($samples); | ||
$transformer->transform($samples); | ||
/* | ||
$samples = [ | ||
[0 => 0, 1 => 0, 2 => 0.602, 3 => 0.301, 4 => 0, 5 => 0], | ||
[0 => 0, 1 => 0, 2 => 0, 3 => 0, 4 => 0.602, 5 => 0.903], | ||
]; | ||
*/ | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# Confusion Matrix | ||
|
||
Class for compute confusion matrix to evaluate the accuracy of a classification. | ||
|
||
### Example (all targets) | ||
|
||
Compute ConfusionMatrix for all targets. | ||
|
||
``` | ||
use Phpml\Metric\ConfusionMatrix; | ||
$actualTargets = [2, 0, 2, 2, 0, 1]; | ||
$predictedTargets = [0, 0, 2, 2, 0, 2]; | ||
$confusionMatrix = ConfusionMatrix::compute($actualTargets, $predictedTargets) | ||
/* | ||
$confusionMatrix = [ | ||
[2, 0, 0], | ||
[0, 0, 1], | ||
[1, 0, 2], | ||
]; | ||
*/ | ||
``` | ||
|
||
### Example (chosen targets) | ||
|
||
Compute ConfusionMatrix for chosen targets. | ||
|
||
``` | ||
use Phpml\Metric\ConfusionMatrix; | ||
$actualTargets = ['cat', 'ant', 'cat', 'cat', 'ant', 'bird']; | ||
$predictedTargets = ['ant', 'ant', 'cat', 'cat', 'ant', 'cat']; | ||
$confusionMatrix = ConfusionMatrix::compute($actualTargets, $predictedTargets, ['ant', 'bird']) | ||
/* | ||
$confusionMatrix = [ | ||
[2, 0], | ||
[0, 0], | ||
]; | ||
*/ | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
# Pipeline | ||
|
||
In machine learning, it is common to run a sequence of algorithms to process and learn from dataset. For example: | ||
|
||
* Split each document’s text into tokens. | ||
* Convert each document’s words into a numerical feature vector ([Token Count Vectorizer](machine-learning/feature-extraction/token-count-vectorizer/)). | ||
* Learn a prediction model using the feature vectors and labels. | ||
|
||
PHP-ML represents such a workflow as a Pipeline, which consists sequence of transformers and a estimator. | ||
|
||
|
||
### Constructor Parameters | ||
|
||
* $transformers (array|Transformer[]) - sequence of objects that implements Transformer interface | ||
* $estimator (Estimator) - estimator that can train and predict | ||
|
||
``` | ||
use Phpml\Classification\SVC; | ||
use Phpml\FeatureExtraction\TfIdfTransformer; | ||
use Phpml\Pipeline; | ||
$transformers = [ | ||
new TfIdfTransformer(), | ||
]; | ||
$estimator = new SVC(); | ||
$pipeline = new Pipeline($transformers, $estimator); | ||
``` | ||
|
||
### Example | ||
|
||
First our pipeline replace missing value, then normalize samples and finally train SVC estimator. Thus prepared pipeline repeats each transformation step for predicted sample. | ||
|
||
``` | ||
use Phpml\Classification\SVC; | ||
use Phpml\Pipeline; | ||
use Phpml\Preprocessing\Imputer; | ||
use Phpml\Preprocessing\Normalizer; | ||
use Phpml\Preprocessing\Imputer\Strategy\MostFrequentStrategy; | ||
$transformers = [ | ||
new Imputer(null, new MostFrequentStrategy()), | ||
new Normalizer(), | ||
]; | ||
$estimator = new SVC(); | ||
$samples = [ | ||
[1, -1, 2], | ||
[2, 0, null], | ||
[null, 1, -1], | ||
]; | ||
$targets = [ | ||
4, | ||
1, | ||
4, | ||
]; | ||
$pipeline = new Pipeline($transformers, $estimator); | ||
$pipeline->train($samples, $targets); | ||
$predicted = $pipeline->predict([[0, 0, 0]]); | ||
// $predicted == 4 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters