Skip to content

Commit a1d1a44

Browse files
added idf-smooth (TheAlgorithms#2174)
* added idf-smooth * added idf-smooth * added idf-smooth
1 parent e92e433 commit a1d1a44

File tree

1 file changed

+12
-4
lines changed

1 file changed

+12
-4
lines changed

machine_learning/word_frequency_functions.py

+12-4
Original file line numberDiff line numberDiff line change
@@ -83,16 +83,17 @@ def document_frequency(term: str, corpus: str) -> int:
8383
return (len([doc for doc in docs if term in doc]), len(docs))
8484

8585

86-
def inverse_document_frequency(df: int, N: int) -> float:
86+
def inverse_document_frequency(df: int, N: int, smoothing=False) -> float:
8787
"""
8888
Return an integer denoting the importance
8989
of a word. This measure of importance is
9090
calculated by log10(N/df), where N is the
9191
number of documents and df is
9292
the Document Frequency.
93-
@params : df, the Document Frequency, and N,
94-
the number of documents in the corpus.
95-
@returns : log10(N/df)
93+
@params : df, the Document Frequency, N,
94+
the number of documents in the corpus and
95+
smoothing, if True return the idf-smooth
96+
@returns : log10(N/df) or 1+log10(N/1+df)
9697
@examples :
9798
>>> inverse_document_frequency(3, 0)
9899
Traceback (most recent call last):
@@ -104,7 +105,14 @@ def inverse_document_frequency(df: int, N: int) -> float:
104105
Traceback (most recent call last):
105106
...
106107
ZeroDivisionError: df must be > 0
108+
>>> inverse_document_frequency(0, 3,True)
109+
1.477
107110
"""
111+
if smoothing:
112+
if N == 0:
113+
raise ValueError("log10(0) is undefined.")
114+
return round(1 + log10(N / (1 + df)), 3)
115+
108116
if df == 0:
109117
raise ZeroDivisionError("df must be > 0")
110118
elif N == 0:

0 commit comments

Comments
 (0)