Decision Trees

The need for decision trees draws from the need of classification. Classification is a data mining term used to predict group membership for data membership. For example, someone might want to know if their will be rain or not on a particular day. 

Let us understand decision trees.

A decision tree is a decision tool that uses a tree like graph or model of decisions and their possible consequences, including cost, outcomes and utility. It is rather a way of showing algorithms. 

In easy terms, it can be said that decision tree are like flow chart structures in which internal nodes represent the “test” on an attribute and the branch shows the outcome and the leaf nodes show the decision taken after the outcome. The paths from root to leaf are known as classifications. 

There are three kinds of nodes in a decision tree –

1. Decision Nodes- Commonly represented by squares

2. Chances Nodes- Represented By circles

3. End Nodes – Represented by triangles.

When are Decision Trees Useful ?

The times when decision trees are particularly helpful are –

  1. When a perfect information is available.
  1. When you need to formulate a conditional values table.
  1. When the opportunity loss table is available.
  1. When a sequence of decisions need to be made.
  1. When all possible outcomes and alternatives are not known.

They are most commonly used in operations research and operations management. They are even used for calculating conditional probability.
Decision Rules – Decision tree is a flowchart and can be linearized into decision rules where the outcome is the content of the leaf node and the conditions along the path form a conjunction in the if clause.
Advantages Of Decision Trees- 

  1. When we try to fit a decision tree into a training data set, the top few nods on which tree is split are the ones that are the most important within the dataset and hence we get the most important data sets automatically, without any further ado.
  2. They are relatively easy for data preparation. This is because of two simple reasons – 1. Since the tree structure remains the same no matter what, there is no need for data transformation. 2. Missing values do not prevent the trees from splitting.
  3. Trees don’t require the data to be linearly related. Therefore, we can use them when we know the parameters are non – linearly related.
  4. The last and the best reason of using decision trees is that they are easy to explain and understand as well. 

Disadvantages Of Decision Trees-

  1. Calculations get very complex if many values are uncertain and/of if many outcomes are linked.
  2. The data that includes categorical variables with different number of levels, information gain in decision trees are biased in favor of the attributes with more levels.

Happy Learning 🙂



Leave a comment