Skip to content

Commit

Permalink
Merge pull request #12 from php-ai/develop
Browse files Browse the repository at this point in the history
New features: ClassificationReport and FileDataset
  • Loading branch information
akondas authored Jul 24, 2016
2 parents 9d900be + 403824d commit 0869043
Show file tree
Hide file tree
Showing 71 changed files with 1,014 additions and 25 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
/vendor/
humbuglog.*
/bin/phpunit
.coverage
10 changes: 8 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,15 @@ CHANGELOG

This changelog references the relevant changes done in PHP-ML library.

* 0.2.0 (in plan)
* feature [Dataset] - FileDataset - load dataset from files (folders as targets)
* 0.1.3 (in plan/progress)
* SSE, SSTo, SSR [Regression] - sum of the squared
*

* 0.1.2 (2016-07-24)
* feature [Dataset] - FilesDataset - load dataset from files (folder names as targets)
* feature [Metric] - ClassificationReport - report about trained classifier
* bug [Feature Extraction] - fix problem with token count vectorizer array order
* tests [General] - add more tests for specific conditions

* 0.1.1 (2016-07-12)
* feature [Cross Validation] Stratified Random Split - equal distribution for targets in split
Expand Down
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,19 @@
# PHP-ML - Machine Learning library for PHP

[![Minimum PHP Version](https://img.shields.io/badge/php-%3E%3D%207.0-8892BF.svg)](https://php.net/)
[![Latest Stable Version](https://img.shields.io/packagist/v/php-ai/php-ml.svg)](https://packagist.org/packages/php-ai/php-ml)
[![Build Status](https://scrutinizer-ci.com/g/php-ai/php-ml/badges/build.png?b=develop)](https://scrutinizer-ci.com/g/php-ai/php-ml/build-status/develop)
[![Documentation Status](https://readthedocs.org/projects/php-ml/badge/?version=develop)](http://php-ml.readthedocs.org/en/develop/?badge=develop)
[![Total Downloads](https://poser.pugx.org/php-ai/php-ml/downloads.svg)](https://packagist.org/packages/php-ai/php-ml)
[![License](https://poser.pugx.org/php-ai/php-ml/license.svg)](https://packagist.org/packages/php-ai/php-ml)
[![Scrutinizer Code Quality](https://scrutinizer-ci.com/g/php-ai/php-ml/badges/quality-score.png?b=develop)](https://scrutinizer-ci.com/g/php-ai/php-ml/?branch=develop)

![PHP-ML - Machine Learning library for PHP](docs/assets/php-ml-logo.png)

Fresh approach to Machine Learning in PHP. Algorithms, Cross Validation, Preprocessing, Feature Extraction and much more in one library.

PHP-ML requires PHP >= 7.0.

Simple example of classification:
```php
use Phpml\Classification\KNearestNeighbors;
Expand All @@ -34,6 +40,10 @@ Currently this library is in the process of developing, but You can install it w
composer require php-ai/php-ml
```

## Examples

Example scripts are available in a separate repository [php-ai/php-ml-examples](https://github.com/php-ai/php-ml-examples).

## Features

* Classification
Expand All @@ -49,6 +59,7 @@ composer require php-ai/php-ml
* Metric
* [Accuracy](http://php-ml.readthedocs.io/en/latest/machine-learning/metric/accuracy/)
* [Confusion Matrix](http://php-ml.readthedocs.io/en/latest/machine-learning/metric/confusion-matrix/)
* [Classification Report](http://php-ml.readthedocs.io/en/latest/machine-learning/metric/classification-report/)
* Workflow
* [Pipeline](http://php-ml.readthedocs.io/en/latest/machine-learning/workflow/pipeline)
* Cross Validation
Expand All @@ -61,7 +72,9 @@ composer require php-ai/php-ml
* [Token Count Vectorizer](http://php-ml.readthedocs.io/en/latest/machine-learning/feature-extraction/token-count-vectorizer/)
* [Tf-idf Transformer](http://php-ml.readthedocs.io/en/latest/machine-learning/feature-extraction/tf-idf-transformer/)
* Datasets
* [Array](http://php-ml.readthedocs.io/en/latest/machine-learning/datasets/array-dataset/)
* [CSV](http://php-ml.readthedocs.io/en/latest/machine-learning/datasets/csv-dataset/)
* [Files](http://php-ml.readthedocs.io/en/latest/machine-learning/datasets/files-dataset/)
* Ready to use:
* [Iris](http://php-ml.readthedocs.io/en/latest/machine-learning/datasets/demo/iris/)
* [Wine](http://php-ml.readthedocs.io/en/latest/machine-learning/datasets/demo/wine/)
Expand Down
Binary file added docs/assets/php-ml-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
15 changes: 14 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,18 @@
# PHP-ML - Machine Learning library for PHP

[![Minimum PHP Version](https://img.shields.io/badge/php-%3E%3D%207.0-8892BF.svg)](https://php.net/)
[![Latest Stable Version](https://img.shields.io/packagist/v/php-ai/php-ml.svg)](https://packagist.org/packages/php-ai/php-ml)
[![Build Status](https://scrutinizer-ci.com/g/php-ai/php-ml/badges/build.png?b=develop)](https://scrutinizer-ci.com/g/php-ai/php-ml/build-status/develop)
[![Documentation Status](https://readthedocs.org/projects/php-ml/badge/?version=develop)](http://php-ml.readthedocs.org/en/develop/?badge=develop)
[![Total Downloads](https://poser.pugx.org/php-ai/php-ml/downloads.svg)](https://packagist.org/packages/php-ai/php-ml)
[![License](https://poser.pugx.org/php-ai/php-ml/license.svg)](https://packagist.org/packages/php-ai/php-ml)
[![Scrutinizer Code Quality](https://scrutinizer-ci.com/g/php-ai/php-ml/badges/quality-score.png?b=develop)](https://scrutinizer-ci.com/g/php-ai/php-ml/?branch=develop)

Fresh approach to Machine Learning in PHP. Note that at the moment PHP is not the best choice for machine learning but maybe this will change ...
![PHP-ML - Machine Learning library for PHP](assets/php-ml-logo.png)

Fresh approach to Machine Learning in PHP. Algorithms, Cross Validation, Preprocessing, Feature Extraction and much more in one library.

PHP-ML requires PHP >= 7.0.

Simple example of classification:
```php
Expand Down Expand Up @@ -34,6 +40,10 @@ Currently this library is in the process of developing, but You can install it w
composer require php-ai/php-ml
```

## Examples

Example scripts are available in a separate repository [php-ai/php-ml-examples](https://github.com/php-ai/php-ml-examples).

## Features

* Classification
Expand All @@ -49,6 +59,7 @@ composer require php-ai/php-ml
* Metric
* [Accuracy](machine-learning/metric/accuracy/)
* [Confusion Matrix](machine-learning/metric/confusion-matrix/)
* [Classification Report](machine-learning/metric/classification-report/)
* Workflow
* [Pipeline](machine-learning/workflow/pipeline)
* Cross Validation
Expand All @@ -61,7 +72,9 @@ composer require php-ai/php-ml
* [Token Count Vectorizer](machine-learning/feature-extraction/token-count-vectorizer/)
* [Tf-idf Transformer](machine-learning/feature-extraction/tf-idf-transformer/)
* Datasets
* [Array](machine-learning/datasets/array-dataset/)
* [CSV](machine-learning/datasets/csv-dataset/)
* [Files](machine-learning/datasets/files-dataset/)
* Ready to use:
* [Iris](machine-learning/datasets/demo/iris/)
* [Wine](machine-learning/datasets/demo/wine/)
Expand Down
57 changes: 57 additions & 0 deletions docs/machine-learning/datasets/files-dataset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# FilesDataset

Helper class that loads dataset from files. Use folder names as targets. It extends the `ArrayDataset`.

### Constructors Parameters

* $rootPath - (string) path to root folder that contains files dataset

```
use Phpml\Dataset\FilesDataset;
$dataset = new FilesDataset('path/to/data');
```

See [ArrayDataset](machine-learning/datasets/array-dataset/) for more information.

### Example

Files structure:

```
data
business
001.txt
002.txt
...
entertainment
001.txt
002.txt
...
politics
001.txt
002.txt
...
sport
001.txt
002.txt
...
tech
001.txt
002.txt
...
```

Load files data with `FilesDataset`:

```
use Phpml\Dataset\FilesDataset;
$dataset = new FilesDataset('path/to/data');
$dataset->getSamples()[0][0] // content from file path/to/data/business/001.txt
$dataset->getTargets()[0] // business
$dataset->getSamples()[40][0] // content from file path/to/data/tech/001.txt
$dataset->getTargets()[0] // tech
```
61 changes: 61 additions & 0 deletions docs/machine-learning/metric/classification-report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Classification Report

Class for calculate main classifier metrics: precision, recall, F1 score and support.

### Report

To generate report you must provide the following parameters:

* $actualLabels - (array) true sample labels
* $predictedLabels - (array) predicted labels (e.x. from test group)

```
use Phpml\Metric\ClassificationReport;
$actualLabels = ['cat', 'ant', 'bird', 'bird', 'bird'];
$predictedLabels = ['cat', 'cat', 'bird', 'bird', 'ant'];
$report = new ClassificationReport($actualLabels, $predictedLabels);
```

### Metrics

After creating the report you can draw its individual metrics:

* precision (`getPrecision()`) - fraction of retrieved instances that are relevant
* recall (`getRecall()`) - fraction of relevant instances that are retrieved
* F1 score (`getF1score()`) - measure of a test's accuracy
* support (`getSupport()`) - count of testes samples

```
$precision = $report->getPrecision();
// $precision = ['cat' => 0.5, 'ant' => 0.0, 'bird' => 1.0];
```

### Example

```
use Phpml\Metric\ClassificationReport;
$actualLabels = ['cat', 'ant', 'bird', 'bird', 'bird'];
$predictedLabels = ['cat', 'cat', 'bird', 'bird', 'ant'];
$report = new ClassificationReport($actualLabels, $predictedLabels);
$report->getPrecision();
// ['cat' => 0.5, 'ant' => 0.0, 'bird' => 1.0]
$report->getRecall();
// ['cat' => 1.0, 'ant' => 0.0, 'bird' => 0.67]
$report->getF1score();
// ['cat' => 0.67, 'ant' => 0.0, 'bird' => 0.80]
$report->getSupport();
// ['cat' => 1, 'ant' => 1, 'bird' => 3]
$report->getAverage();
// ['precision' => 0.75, 'recall' => 0.83, 'f1score' => 0.73]
```
2 changes: 2 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ pages:
- Metric:
- Accuracy: machine-learning/metric/accuracy.md
- Confusion Matrix: machine-learning/metric/confusion-matrix.md
- Classification Report: machine-learning/metric/classification-report.md
- Workflow:
- Pipeline: machine-learning/workflow/pipeline.md
- Cross Validation:
Expand All @@ -29,6 +30,7 @@ pages:
- Datasets:
- Array Dataset: machine-learning/datasets/array-dataset.md
- CSV Dataset: machine-learning/datasets/csv-dataset.md
- Files Dataset: machine-learning/datasets/files-dataset.md
- Ready to use datasets:
- Iris: machine-learning/datasets/demo/iris.md
- Wine: machine-learning/datasets/demo/wine.md
Expand Down
5 changes: 0 additions & 5 deletions src/Phpml/Dataset/CsvDataset.php
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,6 @@

class CsvDataset extends ArrayDataset
{
/**
* @var string
*/
protected $filepath;

/**
* @param string $filepath
* @param int $features
Expand Down
47 changes: 47 additions & 0 deletions src/Phpml/Dataset/FilesDataset.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
<?php

declare (strict_types = 1);

namespace Phpml\Dataset;

use Phpml\Exception\DatasetException;

class FilesDataset extends ArrayDataset
{
/**
* @param string $rootPath
*
* @throws DatasetException
*/
public function __construct(string $rootPath)
{
if (!is_dir($rootPath)) {
throw DatasetException::missingFolder($rootPath);
}

$this->scanRootPath($rootPath);
}

/**
* @param string $rootPath
*/
private function scanRootPath(string $rootPath)
{
foreach (glob($rootPath.DIRECTORY_SEPARATOR.'*', GLOB_ONLYDIR) as $dir) {
$this->scanDir($dir);
}
}

/**
* @param string $dir
*/
private function scanDir(string $dir)
{
$target = basename($dir);

foreach (array_filter(glob($dir.DIRECTORY_SEPARATOR.'*'), 'is_file') as $file) {
$this->samples[] = [file_get_contents($file)];
$this->targets[] = $target;
}
}
}
12 changes: 10 additions & 2 deletions src/Phpml/Exception/DatasetException.php
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,19 @@ class DatasetException extends \Exception
*/
public static function missingFile($filepath)
{
return new self(sprintf('Dataset file %s missing.', $filepath));
return new self(sprintf('Dataset file "%s" missing.', $filepath));
}

/**
* @return DatasetException
*/
public static function missingFolder($path)
{
return new self(sprintf('Dataset root folder "%s" missing.', $path));
}

public static function cantOpenFile($filepath)
{
return new self(sprintf('Dataset file %s can\'t be open.', $filepath));
return new self(sprintf('Dataset file "%s" can\'t be open.', $filepath));
}
}
8 changes: 0 additions & 8 deletions src/Phpml/Exception/NormalizerException.php
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,4 @@ public static function unknownNorm()
{
return new self('Unknown norm supplied.');
}

/**
* @return NormalizerException
*/
public static function fitNotAllowed()
{
return new self('Fit is not allowed for this preprocessor.');
}
}
2 changes: 2 additions & 0 deletions src/Phpml/FeatureExtraction/TokenCountVectorizer.php
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,8 @@ private function transformSample(string &$sample)
}
}

ksort($counts);

$sample = $counts;
}

Expand Down
Loading

0 comments on commit 0869043

Please sign in to comment.