Skip to content

Commit

Permalink
create docs for tf-idf transformer
Browse files Browse the repository at this point in the history
  • Loading branch information
akondas committed Jul 11, 2016
1 parent ba89274 commit 7c0767c
Show file tree
Hide file tree
Showing 6 changed files with 47 additions and 2 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ This changelog references the relevant changes done in PHP-ML library.

* 0.1.1 (2016-07-12)
* feature [Cross Validation] Stratified Random Split - equal distribution for targets in split
* feature [General] Documentation - add missing pages and fix links
* feature [General] Documentation - add missing pages (Pipeline, ConfusionMatrix and TfIdfTransformer) and fix links

* 0.1.0 (2016-07-08)
* first develop release
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ composer require php-ai/php-ml
* [Imputation missing values](http://php-ml.readthedocs.io/en/latest/machine-learning/preprocessing/imputation-missing-values/)
* Feature Extraction
* [Token Count Vectorizer](http://php-ml.readthedocs.io/en/latest/machine-learning/feature-extraction/token-count-vectorizer/)
* [Tf-idf Transformer](http://php-ml.readthedocs.io/en/latest/machine-learning/feature-extraction/tf-idf-transformer/)
* Datasets
* [CSV](http://php-ml.readthedocs.io/en/latest/machine-learning/datasets/csv-dataset/)
* Ready to use:
Expand Down
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ composer require php-ai/php-ml
* [Imputation missing values](machine-learning/preprocessing/imputation-missing-values/)
* Feature Extraction
* [Token Count Vectorizer](machine-learning/feature-extraction/token-count-vectorizer/)
* [Tf-idf Transformer](machine-learning/feature-extraction/tf-idf-transformer/)
* Datasets
* [CSV](machine-learning/datasets/csv-dataset/)
* Ready to use:
Expand Down
42 changes: 42 additions & 0 deletions docs/machine-learning/feature-extraction/tf-idf-transformer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Tf-idf Transformer

Tf–idf, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.

### Constructor Parameters

* $samples (array) - samples for fit tf-idf model

```
use Phpml\FeatureExtraction\TfIdfTransformer;
$samples = [
[1, 2, 4],
[0, 2, 1]
];
$transformer = new TfIdfTransformer($samples);
```

### Transformation

To transform a collection of text samples use `transform` method. Example:

```
use Phpml\FeatureExtraction\TfIdfTransformer;
$samples = [
[0 => 1, 1 => 1, 2 => 2, 3 => 1, 4 => 0, 5 => 0],
[0 => 1, 1 => 1, 2 => 0, 3 => 0, 4 => 2, 5 => 3],
];
$transformer = new TfIdfTransformer($samples);
$transformer->transform($samples);
/*
$samples = [
[0 => 0, 1 => 0, 2 => 0.602, 3 => 0.301, 4 => 0, 5 => 0],
[0 => 0, 1 => 0, 2 => 0, 3 => 0, 4 => 0.602, 5 => 0.903],
];
*/
```
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ pages:
- Imputation missing values: machine-learning/preprocessing/imputation-missing-values.md
- Feature Extraction:
- Token Count Vectorizer: machine-learning/feature-extraction/token-count-vectorizer.md
- Tf-idf Transformer: machine-learning/feature-extraction/tf-idf-transformer.md
- Datasets:
- Array Dataset: machine-learning/datasets/array-dataset.md
- CSV Dataset: machine-learning/datasets/csv-dataset.md
Expand Down
2 changes: 1 addition & 1 deletion tests/Phpml/FeatureExtraction/TfIdfTransformerTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ class TfIdfTransformerTest extends \PHPUnit_Framework_TestCase
{
public function testTfIdfTransformation()
{
//https://en.wikipedia.org/wiki/Tf%E2%80%93idf
// https://en.wikipedia.org/wiki/Tf-idf

$samples = [
[0 => 1, 1 => 1, 2 => 2, 3 => 1, 4 => 0, 5 => 0],
Expand Down

0 comments on commit 7c0767c

Please sign in to comment.