forked from alirezadir/Machine-Learning-Interviews
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Alireza Dirafzoon
committed
Mar 16, 2023
1 parent
bf182d5
commit 4346ffd
Showing
6 changed files
with
540 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,193 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"attachments": {}, | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"A decision tree is a type of machine learning algorithm used for classification and regression tasks. It consists of a tree-like structure where each internal node represents a feature or attribute, each branch represents a decision based on that feature, and each leaf node represents a predicted output.\n", | ||
"\n", | ||
"To **train** a decision tree, the algorithm uses a dataset with labeled examples to create the tree structure. It starts with the root node, which includes all the examples, and selects the feature that provides the most information gain to split the data into two subsets. It then repeats this process for each subset until it reaches a stopping criterion, such as a maximum tree depth or minimum number of examples in a leaf node.\n", | ||
"\n", | ||
"Once the decision tree is trained, it can be used to **predict** the output for new, unseen examples. To make a prediction, the algorithm starts at the root node and follows the branches based on the values of the input features until it reaches a leaf node. The predicted output for that example is the value associated with the leaf node.\n", | ||
"\n", | ||
"Decision trees have several advantages, such as being easy to interpret and visualize, handling both numerical and categorical data, and handling missing values. However, they can also suffer from overfitting if the tree is too complex or if there is noise or outliers in the data. \n", | ||
"\n", | ||
"To address this issue, various techniques such as pruning, ensemble methods, and regularization can be used to simplify the decision tree or combine multiple trees to improve generalization performance. Additionally, decision trees may not perform well with highly imbalanced datasets or datasets with many irrelevant features, and they may not be suitable for tasks where the relationships between features and outputs are highly nonlinear or complex." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import numpy as np\n", | ||
"\n", | ||
"class DecisionTree:\n", | ||
" def __init__(self, max_depth=None):\n", | ||
" self.max_depth = max_depth\n", | ||
" \n", | ||
" def fit(self, X, y):\n", | ||
" self.n_classes_ = len(np.unique(y))\n", | ||
" self.n_features_ = X.shape[1]\n", | ||
" self.tree_ = self._grow_tree(X, y)\n", | ||
" \n", | ||
" def predict(self, X):\n", | ||
" return [self._predict(inputs) for inputs in X]\n", | ||
" \n", | ||
" def _gini(self, y):\n", | ||
" _, counts = np.unique(y, return_counts=True)\n", | ||
" impurity = 1 - np.sum([(count / len(y)) ** 2 for count in counts])\n", | ||
" return impurity\n", | ||
" \n", | ||
" def _best_split(self, X, y):\n", | ||
" m = y.size\n", | ||
" if m <= 1:\n", | ||
" return None, None\n", | ||
" \n", | ||
" num_parent = [np.sum(y == c) for c in range(self.n_classes_)]\n", | ||
" best_gini = 1.0 - sum((n / m) ** 2 for n in num_parent)\n", | ||
" best_idx, best_thr = None, None\n", | ||
" \n", | ||
" for idx in range(self.n_features_):\n", | ||
" thresholds, classes = zip(*sorted(zip(X[:, idx], y)))\n", | ||
" num_left = [0] * self.n_classes_\n", | ||
" num_right = num_parent.copy()\n", | ||
" for i in range(1, m):\n", | ||
" c = classes[i - 1]\n", | ||
" num_left[c] += 1\n", | ||
" num_right[c] -= 1\n", | ||
" gini_left = 1.0 - sum(\n", | ||
" (num_left[x] / i) ** 2 for x in range(self.n_classes_)\n", | ||
" )\n", | ||
" gini_right = 1.0 - sum(\n", | ||
" (num_right[x] / (m - i)) ** 2 for x in range(self.n_classes_)\n", | ||
" )\n", | ||
" gini = (i * gini_left + (m - i) * gini_right) / m\n", | ||
" if thresholds[i] == thresholds[i - 1]:\n", | ||
" continue\n", | ||
" if gini < best_gini:\n", | ||
" best_gini = gini\n", | ||
" best_idx = idx\n", | ||
" best_thr = (thresholds[i] + thresholds[i - 1]) / 2\n", | ||
" \n", | ||
" return best_idx, best_thr\n", | ||
" \n", | ||
" def _grow_tree(self, X, y, depth=0):\n", | ||
" num_samples_per_class = [np.sum(y == i) for i in range(self.n_classes_)]\n", | ||
" predicted_class = np.argmax(num_samples_per_class)\n", | ||
" node = Node(predicted_class=predicted_class)\n", | ||
" if depth < self.max_depth:\n", | ||
" idx, thr = self._best_split(X, y)\n", | ||
" if idx is not None:\n", | ||
" indices_left = X[:, idx] < thr\n", | ||
" X_left, y_left = X[indices_left], y[indices_left]\n", | ||
" X_right, y_right = X[~indices_left], y[~indices_left]\n", | ||
" node.feature_index = idx\n", | ||
" node.threshold = thr\n", | ||
" node.left = self._grow_tree(X_left, y_left, depth + 1)\n", | ||
" node.right = self._grow_tree(X_right, y_right, depth + 1)\n", | ||
" return node\n", | ||
" \n", | ||
" def _predict(self, inputs):\n", | ||
" node = self.tree_\n", | ||
" while node.left:\n", | ||
" if inputs[node.feature_index] < node.threshold:\n", | ||
" node = node.left\n", | ||
" else:\n", | ||
" node = node.right\n", | ||
" return node.predicted_class\n", | ||
" \n", | ||
"class Node:\n", | ||
" def __init__(self, *, predicted_class):\n", | ||
" self.predicted_class = predicted_class\n", | ||
" self.feature_index = 0\n", | ||
" self.threshold = 0.0 \n", | ||
" self.left = None\n", | ||
" self.right = None\n", | ||
"\n", | ||
" def is_leaf_node(self):\n", | ||
" return self.left is None and self.right is None\n", | ||
"\n", | ||
"\n", | ||
"\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Test " | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Accuracy: 1.0\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"from sklearn.datasets import load_iris\n", | ||
"from sklearn.model_selection import train_test_split\n", | ||
"from sklearn.metrics import accuracy_score\n", | ||
"\n", | ||
"# Load the iris dataset\n", | ||
"iris = load_iris()\n", | ||
"X = iris.data\n", | ||
"y = iris.target\n", | ||
"\n", | ||
"# Split the data into training and testing sets\n", | ||
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n", | ||
"\n", | ||
"# Train the decision tree\n", | ||
"tree = DecisionTree(max_depth=3)\n", | ||
"tree.fit(X_train, y_train)\n", | ||
"\n", | ||
"# Make predictions on the test set\n", | ||
"y_pred = tree.predict(X_test)\n", | ||
"\n", | ||
"# Compute the accuracy of the predictions\n", | ||
"accuracy = accuracy_score(y_test, y_pred)\n", | ||
"\n", | ||
"print(f\"Accuracy: {accuracy}\")\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.9.7" | ||
}, | ||
"orig_nbformat": 4 | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
File renamed without changes.
Empty file.
Oops, something went wrong.