A decision tree is a graphical representation of all the possible solutions to a decision based on certain conditions. It is a Supervised learning algorithm that can be used for both classification and Regression problems, but most of the time it’s preferred for solving a classification problem. A Decision tree is a flowchart-like tree structure, where internal nodes represent the feature of a dataset, branches represent the decision rules and each leaf node represents the outcome.

The Purpose of the Decision Tree is to create a training modal that can be used to predict the class or value of a target variable by learning simple decision rules inferred from training data.

**Decision Tree Algorithms** **Terminologies**

**Root node**: The root node is from where the decision tree starts, it represents the entire population or simply, and this further gets divided into two or more homogeneous sets.**Leaf Nod**e: Final output nodes are called leaf nodes, and a tree cannot be split further after getting a leaf node.**Splitting:** the process of making a node into two or more sub-node.**Branch/ Sub-Tree:** a subtree of the main tree is called a branch or sub-tree.**Pruning:** It is the process of reducing the unwanted branches from the tree.**Parent/Child Node: **A node that is divided into a sub-node is called the parent node of sub-nodes whereas sub-nodes are called a child of a parent node

**Decision Tree algorithm working Procedure**

Decision trees use various algorithms to decide the root and to split a node into sub-nodes. i.e. ID3(Interactive Dichotomiser 3) Algorithm use of deciding root nodes and splitting a node into a sub node.

If we want to learn the ID3 algorithm we need to know Entropy (H) and Gain (G)

## Entropy

In data science, entropy is used as a way to measure how “mixed” a column is. Specifically,

entropy is used to measure disorder. Through entropy, we understand how well partitioning a data set can partition out the target variable. Partitioning means taking different values of the target variable for different values of a feature.

## The formula of Entropy

`H(S) = − (P`_{i+}log_{2}P_{i+}+ P_{i−}log_{2}P_{i−})

Where H( S) is used to find out the entropy of the current dataset

H( S) = Entropy of the current/main dataset

P_{i+} = Probability of Positive (Yes) Class in S

P _{i- }= Probability of Negative (No) Class in S

## Information Gain

Information gain is the measurement of changes in entropy after the segmentation of a dataset based on an attribute. It calculates how much information a feature provides us about a class. We will select the root that has more gain than others.

**The formula of Gain:**

`Gain (S, F) = H(S) − `_{v ε F} ∑ P(v)×H(S _{v})

S = Target dataset

F = Feature

v ε F= Each value of Feature F

H(S)= Entropy for the selected attribute of feature F

P(v) = Probability of selected attribute of feature F

## Application of Decision Tree

**Healthcare industries:**In healthcare industries, the decision tree can tell whether a patient is suffering from a disease or not based on his weight, sex, age, and another factor.**Educational sector:**In school, college, or university, a student is eligible for a scholarship or not based on the result, financial status, family income, etc. can be decided on a decision tree.**Banking sector:**A person is eligible for a loan or not based on his salary, family member or financial status, etc. it can be decided on a decision tree.

**Implementation of this example using Python (Jupyter notebook)**

**Step 1: Read dataset through pandas**

```
import pandas as pd
dataset = pd.read_csv('data.csv')
dataset
```

Day | Outlook | Temperature | Routine | Play |

Day1 | Sunny | Cold | Indoor | No |

Day2 | Sunny | Warm | Outdoor | No |

Day3 | Cloudy | Warm | Indoor | No |

Day4 | Sunny | Warm | Indoor | No |

Day5 | Cloudy | Cold | Indoor | Yes |

Day6 | Cloudy | Cold | Outdoor | Yes |

Day7 | Sunny | Cold | Outdoor | Yes |

**Step 2: Check missing value, if any then handle it**

```
dataset.isnull().sum()
#there is no null value.
# output
Day 0
Outlook 0
Temperature 0
Routine 0
Play 0
dtype: int64
```

**Step 3: Data preprocessing**

The machine can not work with string values so, we need to convert a string to a numeric value. that’s why preprocessing is required. Many ways to preprocess data, one of these is label encodings from **sklearn**.

```
from sklearn.preprocessing import LabelEncoder
le_x = LabelEncoder()
x =
dataset[['Outlook','Temperature','Routine']].apply(LabelEncoder().fit_transfo
rm)
x
```

Outlook | Temperature | Routine |

1 | 0 | 0 |

1 | 1 | 1 |

0 | 1 | 0 |

1 | 1 | 0 |

0 | 0 | 0 |

0 | 0 | 1 |

1 | 0 | 1 |

**Step 4: create decisionTreeClassifier model and train it**

```
from sklearn.tree import DecisionTreeClassifier
modal = DecisionTreeClassifier()
modal.fit(x, dataset.Play)
# Output
DecisionTreeClassifier()
```

**Step 5: predict data using some data**

```
import numpy as np
x_test = np.array([1,0,1]) # 1-> Sunny, 0-> Cold, 1-> Outdoor; according to
preprocessed table
modal.predict([x_test])[0]
# Output
'Yes'
```