Skip to content

Commit

Permalink
Updates to the documentation (linguistic corrections) (#414)
Browse files Browse the repository at this point in the history
* Fix typo in Features list

* Update distance.md documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation

* Fix grammatical mistakes in documentation
  • Loading branch information
a-bakos authored and akondas committed Nov 2, 2019
1 parent f30e576 commit 7d5c6b1
Show file tree
Hide file tree
Showing 27 changed files with 82 additions and 83 deletions.
8 changes: 4 additions & 4 deletions docs/machine-learning/association/apriori.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ $associator = new Apriori($support = 0.5, $confidence = 0.5);

### Train

To train a associator simply provide train samples and labels (as `array`). Example:
To train an associator, simply provide train samples and labels (as `array`). Example:

```
$samples = [['alpha', 'beta', 'epsilon'], ['alpha', 'beta', 'theta'], ['alpha', 'beta', 'epsilon'], ['alpha', 'beta', 'theta']];
Expand All @@ -31,7 +31,7 @@ You can train the associator using multiple data sets, predictions will be based

### Predict

To predict sample label use `predict` method. You can provide one sample or array of samples:
To predict sample label use the `predict` method. You can provide one sample or array of samples:

```
$associator->predict(['alpha','theta']);
Expand All @@ -43,7 +43,7 @@ $associator->predict([['alpha','epsilon'],['beta','theta']]);

### Associating

Get generated association rules simply use `rules` method.
To get generated association rules, simply use the `rules` method.

```
$associator->getRules();
Expand All @@ -52,7 +52,7 @@ $associator->getRules();

### Frequent item sets

Generating k-length frequent item sets simply use `apriori` method.
To generate k-length frequent item sets, simply use the `apriori` method.

```
$associator->apriori();
Expand Down
4 changes: 2 additions & 2 deletions docs/machine-learning/classification/k-nearest-neighbors.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ $classifier = new KNearestNeighbors($k=3, new Minkowski($lambda=4));

## Train

To train a classifier simply provide train samples and labels (as `array`). Example:
To train a classifier, simply provide train samples and labels (as `array`). Example:

```
$samples = [[1, 3], [1, 4], [2, 4], [3, 1], [4, 1], [4, 2]];
Expand All @@ -28,7 +28,7 @@ You can train the classifier using multiple data sets, predictions will be based

## Predict

To predict sample label use `predict` method. You can provide one sample or array of samples:
To predict sample label use the `predict` method. You can provide one sample or array of samples:

```
$classifier->predict([3, 2]);
Expand Down
4 changes: 2 additions & 2 deletions docs/machine-learning/classification/naive-bayes.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Classifier based on applying Bayes' theorem with strong (naive) independence ass

### Train

To train a classifier simply provide train samples and labels (as `array`). Example:
To train a classifier, simply provide train samples and labels (as `array`). Example:

```
$samples = [[5, 1, 1], [1, 5, 1], [1, 1, 5]];
Expand All @@ -18,7 +18,7 @@ You can train the classifier using multiple data sets, predictions will be based

### Predict

To predict sample label use `predict` method. You can provide one sample or array of samples:
To predict sample label use the `predict` method. You can provide one sample or array of samples:

```
$classifier->predict([3, 1, 1]);
Expand Down
6 changes: 3 additions & 3 deletions docs/machine-learning/classification/svc.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ $classifier = new SVC(Kernel::RBF, $cost = 1000, $degree = 3, $gamma = 6);

### Train

To train a classifier simply provide train samples and labels (as `array`). Example:
To train a classifier, simply provide train samples and labels (as `array`). Example:

```
use Phpml\Classification\SVC;
Expand All @@ -38,7 +38,7 @@ You can train the classifier using multiple data sets, predictions will be based

### Predict

To predict sample label use `predict` method. You can provide one sample or array of samples:
To predict sample label use the `predict` method. You can provide one sample or array of samples:

```
$classifier->predict([3, 2]);
Expand Down Expand Up @@ -74,7 +74,7 @@ $classifier = new SVC(
$classifier->train($samples, $labels);
```

Then use `predictProbability` method instead of `predict`:
Then use the `predictProbability` method instead of `predict`:

```
$classifier->predictProbability([3, 2]);
Expand Down
4 changes: 2 additions & 2 deletions docs/machine-learning/clustering/dbscan.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,12 @@ $dbscan = new DBSCAN($epsilon = 2, $minSamples = 3, new Minkowski($lambda=4));

### Clustering

To divide the samples into clusters simply use `cluster` method. It's return the `array` of clusters with samples inside.
To divide the samples into clusters, simply use the `cluster` method. It returns the `array` of clusters with samples inside.

```
$samples = [[1, 1], [8, 7], [1, 2], [7, 8], [2, 1], [8, 9]];
$dbscan = new DBSCAN($epsilon = 2, $minSamples = 3);
$dbscan->cluster($samples);
// return [0=>[[1, 1], ...], 1=>[[8, 7], ...]]
// return [0=>[[1, 1], ...], 1=>[[8, 7], ...]]
```
10 changes: 5 additions & 5 deletions docs/machine-learning/clustering/k-means.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# K-means clustering

The K-Means algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares.
The K-Means algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares.
This algorithm requires the number of clusters to be specified.

### Constructor Parameters
Expand All @@ -15,11 +15,11 @@ $kmeans = new KMeans(4, KMeans::INIT_RANDOM);

### Clustering

To divide the samples into clusters simply use `cluster` method. It's return the `array` of clusters with samples inside.
To divide the samples into clusters, simply use the `cluster` method. It returns the `array` of clusters with samples inside.

```
$samples = [[1, 1], [8, 7], [1, 2], [7, 8], [2, 1], [8, 9]];
Or if you need to keep your indentifiers along with yours samples you can use array keys as labels.
Or if you need to keep your identifiers along with yours samples you can use array keys as labels.
$samples = [ 'Label1' => [1, 1], 'Label2' => [8, 7], 'Label3' => [1, 2]];
$kmeans = new KMeans(2);
Expand All @@ -32,8 +32,8 @@ $kmeans->cluster($samples);
#### kmeans++ (default)

K-means++ method selects initial cluster centers for k-mean clustering in a smart way to speed up convergence.
It use the DASV seeding method consists of finding good initial centroids for the clusters.
It uses the DASV seeding method consists of finding good initial centroids for the clusters.

#### random

Random initialization method chooses completely random centroid. It get the space boundaries to avoid placing clusters centroid too far from samples data.
Random initialization method chooses completely random centroid. It gets the space boundaries to avoid placing cluster centroids too far from samples data.
6 changes: 3 additions & 3 deletions docs/machine-learning/cross-validation/random-split.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
# Random Split

One of the simplest methods from Cross-validation is implemented as `RandomSpilt` class. Samples are split to two groups: train group and test group. You can adjust number of samples in each group.
One of the simplest methods from Cross-validation is implemented as `RandomSpilt` class. Samples are split to two groups: train group and test group. You can adjust the number of samples in each group.

### Constructor Parameters

* $dataset - object that implements `Dataset` interface
* $testSize - a fraction of test split (float, from 0 to 1, default: 0.3)
* $seed - seed for random generator (e.g. for tests)

```
$randomSplit = new RandomSplit($dataset, 0.2);
```

### Samples and labels groups

To get samples or labels from test and train group you can use getters:
To get samples or labels from test and train group, you can use getters:

```
$dataset = new RandomSplit($dataset, 0.3, 1234);
Expand Down
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
# Stratified Random Split

Analogously to `RandomSpilt` class samples are split to two groups: train group and test group.
Analogously to `RandomSpilt` class, samples are split to two groups: train group and test group.
Distribution of samples takes into account their targets and trying to divide them equally.
You can adjust number of samples in each group.
You can adjust the number of samples in each group.

### Constructor Parameters

* $dataset - object that implements `Dataset` interface
* $testSize - a fraction of test split (float, from 0 to 1, default: 0.3)
* $seed - seed for random generator (e.g. for tests)

```
$split = new StratifiedRandomSplit($dataset, 0.2);
```

### Samples and labels groups

To get samples or labels from test and train group you can use getters:
To get samples or labels from test and train group, you can use getters:

```
$dataset = new StratifiedRandomSplit($dataset, 0.3, 1234);
Expand All @@ -41,4 +41,4 @@ $dataset = new ArrayDataset(
$split = new StratifiedRandomSplit($dataset, 0.5);
```

Split will have equals amount of each target. Two of the target `a` and two of `b`.
Split will have equal amounts of each target. Two of the target `a` and two of `b`.
6 changes: 3 additions & 3 deletions docs/machine-learning/datasets/array-dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Helper class that holds data as PHP `array` type. Implements the `Dataset` interface which is used heavily in other classes.

### Constructors Parameters
### Constructor Parameters

* $samples - (array) of samples
* $labels - (array) of labels
Expand All @@ -15,7 +15,7 @@ $dataset = new ArrayDataset([[1, 1], [2, 1], [3, 2], [4, 1]], ['a', 'a', 'b', 'b

### Samples and labels

To get samples or labels you can use getters:
To get samples or labels, you can use getters:

```
$dataset->getSamples();
Expand All @@ -24,7 +24,7 @@ $dataset->getTargets();

### Remove columns

You can remove columns by index numbers, for example:
You can remove columns by their index numbers, for example:

```
use Phpml\Dataset\ArrayDataset;
Expand Down
4 changes: 2 additions & 2 deletions docs/machine-learning/datasets/csv-dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@

Helper class that loads data from CSV file. It extends the `ArrayDataset`.

### Constructors Parameters
### Constructor Parameters

* $filepath - (string) path to `.csv` file
* $features - (int) number of columns that are features (starts from first column), last column must be a label
* $headingRow - (bool) define is file have a heading row (if `true` then first row will be ignored)
* $headingRow - (bool) define if the file has a heading row (if `true` then first row will be ignored)

```
$dataset = new CsvDataset('dataset.csv', 2, true);
Expand Down
4 changes: 2 additions & 2 deletions docs/machine-learning/datasets/files-dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Helper class that loads dataset from files. Use folder names as targets. It extends the `ArrayDataset`.

### Constructors Parameters
### Constructor Parameters

* $rootPath - (string) path to root folder that contains files dataset

Expand Down Expand Up @@ -42,7 +42,7 @@ data
...
```

Load files data with `FilesDataset`:
Load files data with `FilesDataset`:

```
use Phpml\Dataset\FilesDataset;
Expand Down
4 changes: 2 additions & 2 deletions docs/machine-learning/datasets/mnist-dataset.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# MnistDataset

Helper class that load data from MNIST dataset: [http://yann.lecun.com/exdb/mnist/](http://yann.lecun.com/exdb/mnist/)
Helper class that loads data from MNIST dataset: [http://yann.lecun.com/exdb/mnist/](http://yann.lecun.com/exdb/mnist/)

> The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.
It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.
Expand All @@ -18,7 +18,7 @@ $trainDataset = new MnistDataset('train-images-idx3-ubyte', 'train-labels-idx1-u

### Samples and labels

To get samples or labels you can use getters:
To get samples or labels, you can use getters:

```
$dataset->getSamples();
Expand Down
2 changes: 1 addition & 1 deletion docs/machine-learning/datasets/svm-dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Helper class that loads data from SVM-Light format file. It extends the `ArrayDataset`.

### Constructors Parameters
### Constructor Parameters

* $filepath - (string) path to the file

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ $transformer = new TfIdfTransformer($samples);

### Transformation

To transform a collection of text samples use `transform` method. Example:
To transform a collection of text samples, use the `transform` method. Example:

```
use Phpml\FeatureExtraction\TfIdfTransformer;
Expand All @@ -28,7 +28,7 @@ $samples = [
[0 => 1, 1 => 1, 2 => 2, 3 => 1, 4 => 0, 5 => 0],
[0 => 1, 1 => 1, 2 => 0, 3 => 0, 4 => 2, 5 => 3],
];
$transformer = new TfIdfTransformer($samples);
$transformer->transform($samples);
Expand All @@ -38,5 +38,5 @@ $samples = [
[0 => 0, 1 => 0, 2 => 0, 3 => 0, 4 => 0.602, 5 => 0.903],
];
*/
```
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ $vectorizer = new TokenCountVectorizer(new WhitespaceTokenizer());

### Transformation

To transform a collection of text samples use `transform` method. Example:
To transform a collection of text samples, use the `transform` method. Example:

```
$samples = [
Expand All @@ -42,7 +42,7 @@ $vectorizer->transform($samples);

### Vocabulary

You can extract vocabulary using `getVocabulary()` method. Example:
You can extract vocabulary using the `getVocabulary()` method. Example:

```
$vectorizer->getVocabulary();
Expand Down
12 changes: 6 additions & 6 deletions docs/machine-learning/feature-selection/selectkbest.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
## Constructor Parameters

* $k (int) - number of top features to select, rest will be removed (default: 10)
* $scoringFunction (ScoringFunction) - function that take samples and targets and return array with scores (default: ANOVAFValue)
* $scoringFunction (ScoringFunction) - function that takes samples and targets and returns an array with scores (default: ANOVAFValue)

```php
use Phpml\FeatureSelection\SelectKBest;
Expand All @@ -27,13 +27,13 @@ $selector->fit($samples = $dataset->getSamples(), $dataset->getTargets());
$selector->transform($samples);

/*
$samples[0] = [1.4, 0.2];
$samples[0] = [1.4, 0.2];
*/
```

## Scores

You can get a array with the calculated score for each feature.
You can get an array with the calculated score for each feature.
A higher value means that a given feature is better suited for learning.
Of course, the rating depends on the scoring function used.

Expand All @@ -56,7 +56,7 @@ $selector->scores();
float(1179.0343277002)
[3]=>
float(959.32440572573)
}
}
*/
```

Expand All @@ -70,11 +70,11 @@ For classification:
The test is applied to samples from two or more groups, possibly with differing sizes.

For regression:
- **UnivariateLinearRegression**
- **UnivariateLinearRegression**
Quick linear model for testing the effect of a single regressor, sequentially for many regressors.
This is done in 2 steps:
- 1. The cross correlation between each regressor and the target is computed, that is, ((X[:, i] - mean(X[:, i])) * (y - mean_y)) / (std(X[:, i]) *std(y)).
- 2. It is converted to an F score
- 2. It is converted to an F score

## Pipeline

Expand Down
Loading

0 comments on commit 7d5c6b1

Please sign in to comment.