Skip to content

Commit 1f1c3b0

Browse files
Added Normalization and Standardization Algorithms (TheAlgorithms#2192)
* Added Standardization and Normalization algorithms with built-in stats * Implement ndigits for rounding Co-authored-by: Christian Clauss <[email protected]>
1 parent 6c2c08c commit 1f1c3b0

File tree

1 file changed

+62
-0
lines changed

1 file changed

+62
-0
lines changed
+62
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
"""
2+
Normalization Wikipedia: https://en.wikipedia.org/wiki/Normalization
3+
Normalization is the process of converting numerical data to a standard range of values.
4+
This range is typically between [0, 1] or [-1, 1]. The equation for normalization is
5+
x_norm = (x - x_min)/(x_max - x_min) where x_norm is the normalized value, x is the
6+
value, x_min is the minimum value within the column or list of data, and x_max is the
7+
maximum value within the column or list of data. Normalization is used to speed up the
8+
training of data and put all of the data on a similar scale. This is useful because
9+
variance in the range of values of a dataset can heavily impact optimization
10+
(particularly Gradient Descent).
11+
12+
Standardization Wikipedia: https://en.wikipedia.org/wiki/Standardization
13+
Standardization is the process of converting numerical data to a normally distributed
14+
range of values. This range will have a mean of 0 and standard deviation of 1. This is
15+
also known as z-score normalization. The equation for standardization is
16+
x_std = (x - mu)/(sigma) where mu is the mean of the column or list of values and sigma
17+
is the standard deviation of the column or list of values.
18+
19+
Choosing between Normalization & Standardization is more of an art of a science, but it
20+
is often recommended to run experiments with both to see which performs better.
21+
Additionally, a few rules of thumb are:
22+
1. gaussian (normal) distributions work better with standardization
23+
2. non-gaussian (non-normal) distributions work better with normalization
24+
3. If a column or list of values has extreme values / outliers, use standardization
25+
"""
26+
from statistics import mean, stdev
27+
28+
29+
def normalization(data: list, ndigits: int = 3) -> list:
30+
"""
31+
Returns a normalized list of values
32+
@params: data, a list of values to normalize
33+
@returns: a list of normalized values (rounded to ndigits decimal places)
34+
@examples:
35+
>>> normalization([2, 7, 10, 20, 30, 50])
36+
[0.0, 0.104, 0.167, 0.375, 0.583, 1.0]
37+
>>> normalization([5, 10, 15, 20, 25])
38+
[0.0, 0.25, 0.5, 0.75, 1.0]
39+
"""
40+
# variables for calculation
41+
x_min = min(data)
42+
x_max = max(data)
43+
# normalize data
44+
return [round((x - x_min) / (x_max - x_min), ndigits) for x in data]
45+
46+
47+
def standardization(data: list, ndigits: int = 3) -> list:
48+
"""
49+
Returns a standardized list of values
50+
@params: data, a list of values to standardize
51+
@returns: a list of standardized values (rounded to ndigits decimal places)
52+
@examples:
53+
>>> standardization([2, 7, 10, 20, 30, 50])
54+
[-0.999, -0.719, -0.551, 0.009, 0.57, 1.69]
55+
>>> standardization([5, 10, 15, 20, 25])
56+
[-1.265, -0.632, 0.0, 0.632, 1.265]
57+
"""
58+
# variables for calculation
59+
mu = mean(data)
60+
sigma = stdev(data)
61+
# standardize data
62+
return [round((x - mu) / (sigma), ndigits) for x in data]

0 commit comments

Comments
 (0)