Skip to content

Commit

Permalink
Matthews correlation coefficient (huggingface#362)
Browse files Browse the repository at this point in the history
* edit metrics/matthews_correlation/matthews_correlation.py

* Multilabel options for Matthews correlation coefficient metric

* some input checks

* some examples/tests

* fix

* Update metrics/matthews_correlation/matthews_correlation.py

Co-authored-by: Leandro von Werra <[email protected]>

* docs

Co-authored-by: Sander Land <[email protected]>
Co-authored-by: Leandro von Werra <[email protected]>
  • Loading branch information
3 people authored Nov 17, 2022
1 parent a12836b commit 4ca8eed
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 5 deletions.
3 changes: 2 additions & 1 deletion metrics/matthews_correlation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,10 @@ At minimum, this metric requires a list of predictions and a list of references:
- **`predictions`** (`list` of `int`s): Predicted class labels.
- **`references`** (`list` of `int`s): Ground truth labels.
- **`sample_weight`** (`list` of `int`s, `float`s, or `bool`s): Sample weights. Defaults to `None`.
- **`average`**(`None` or `macro`): For the multilabel case, whether to return one correlation coefficient per feature (`average=None`), or the average of them (`average='macro'`). Defaults to `None`.

### Output Values
- **`matthews_correlation`** (`float`): Matthews correlation coefficient.
- **`matthews_correlation`** (`float` or `list` of `float`s): Matthews correlation coefficient, or list of them in the multilabel case without averaging.

The metric output takes the following form:
```python
Expand Down
45 changes: 41 additions & 4 deletions metrics/matthews_correlation/matthews_correlation.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
"""Matthews Correlation metric."""

import datasets
import numpy as np
from sklearn.metrics import matthews_corrcoef

import evaluate
Expand All @@ -36,6 +37,9 @@
Args:
predictions (list of int): Predicted labels, as returned by a model.
references (list of int): Ground truth labels.
average (`string`): This parameter is used for multilabel configs. Defaults to `None`.
- None (default): Returns an array of Matthews correlation coefficients, one for each feature
- 'macro': Calculate metrics for each feature, and find their unweighted mean.
sample_weight (list of int, float, or bool): Sample weights. Defaults to `None`.
Returns:
matthews_correlation (dict containing float): Matthews correlation.
Expand All @@ -62,6 +66,21 @@
... sample_weight=[0.5, 1, 0, 0, 0, 1])
>>> print(round(results['matthews_correlation'], 2))
-0.25
Example 4, Multi-label without averaging:
>>> matthews_metric = evaluate.load("matthews_correlation", config_name="multilabel")
>>> results = matthews_metric.compute(references=[[0,1], [1,0], [1,1]],
... predictions=[[0,1], [1,1], [0,1]])
>>> print(results['matthews_correlation'])
[0.5, 0.0]
Example 5, Multi-label with averaging:
>>> matthews_metric = evaluate.load("matthews_correlation", config_name="multilabel")
>>> results = matthews_metric.compute(references=[[0,1], [1,0], [1,1]],
... predictions=[[0,1], [1,1], [0,1]],
... average='macro')
>>> print(round(results['matthews_correlation'], 2))
0.25
"""

_CITATION = """\
Expand All @@ -88,6 +107,11 @@ def _info(self):
inputs_description=_KWARGS_DESCRIPTION,
features=datasets.Features(
{
"predictions": datasets.Sequence(datasets.Value("int32")),
"references": datasets.Sequence(datasets.Value("int32")),
}
if self.config_name == "multilabel"
else {
"predictions": datasets.Value("int32"),
"references": datasets.Value("int32"),
}
Expand All @@ -97,7 +121,20 @@ def _info(self):
],
)

def _compute(self, predictions, references, sample_weight=None):
return {
"matthews_correlation": float(matthews_corrcoef(references, predictions, sample_weight=sample_weight)),
}
def _compute(self, predictions, references, sample_weight=None, average=None):
if self.config_name == "multilabel":
references = np.array(references)
predictions = np.array(predictions)
if not (references.ndim == 2 and predictions.ndim == 2):
raise ValueError("For multi-label inputs, both references and predictions should be 2-dimensional")
matthews_corr = [
matthews_corrcoef(predictions[:, i], references[:, i], sample_weight=sample_weight)
for i in range(references.shape[1])
]
if average == "macro":
matthews_corr = np.mean(matthews_corr)
elif average is not None:
raise ValueError("Invalid `average`: expected `macro`, or None ")
else:
matthews_corr = float(matthews_corrcoef(references, predictions, sample_weight=sample_weight))
return {"matthews_correlation": matthews_corr}

0 comments on commit 4ca8eed

Please sign in to comment.