Skip to content

Commit f810b73

Browse files
committed
datamining overview
1 parent 1d2f484 commit f810b73

File tree

3 files changed

+23
-0
lines changed

3 files changed

+23
-0
lines changed

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,7 @@ Excerpts from the [Foreword](./docs/foreword_ro.pdf) and [Preface](./docs/prefac
9090
- [How does the random forest model work? How is it different from bagging and boosting in ensemble models?](./faq/bagging-boosting-rf.md)
9191
- [What are the disadvantages of using classic decision tree algorithm for large dataset?](./faq/decision-tree-disadvantages.md)
9292
- [Is it always better to have the largest possible number of folds when performing cross validation?](./faq/number-of-kfolds.md)
93+
- [What are the different fields of study in data mining?](./faq/datamining-overview.md)
9394
- [Why are implementations of decision tree algorithms usually binary, and what are the advantages of the different impurity metrics?](./faq/decision-tree-binary.md)
9495
- [What is the probabilistic interpretation of regularized logistic regression?](./faq/probablistic-logistic-regression.md)
9596
- [Can you give a visual explanation for the back propagation algorithm for neural networks?](./faq/visual-backpropagation.md)

faq/README.md

+1
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ Sebastian
3333
- [How does the random forest model work? How is it different from bagging and boosting in ensemble models?](./bagging-boosting-rf.md)
3434
- [What are the disadvantages of using classic decision tree algorithm for large dataset?](./decision-tree-disadvantages.md)
3535
- [Is it always better to have the largest possible number of folds when performing cross validation?](./number-of-kfolds.md)
36+
- [What are the different fields of study in data mining?](./datamining-overview.md)
3637
- [Why are implementations of decision tree algorithms usually binary, and what are the advantages of the different impurity metrics?](./decision-tree-binary.md)
3738
- [What is the probabilistic interpretation of regularized logistic regression?](./probablistic-logistic-regression.md)
3839
- [Can you give a visual explanation for the back propagation algorithm for neural networks?](./visual-backpropagation.md)

faq/datamining-overview.md

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# What are the different fields of study in data mining?
2+
3+
4+
I would roughly define the different application areas as
5+
6+
1) Clustering (unsupervised learning)
7+
e.g., to find groups of customers based on some similarity
8+
9+
2) Predictive modeling (supervised learning)
10+
2.1) Classification
11+
e.g., medical diagnosis (sick/healthy), image classification etc.
12+
2.2) Regression
13+
e.g., stock trade change prediction
14+
2.3) Ranking
15+
e.g., search engine results
16+
17+
3) Association rule mining
18+
e.g., which products do customers frequently buy together
19+
20+
4) Anomaly detection
21+
e.g., credit fraud detection

0 commit comments

Comments
 (0)