Information Gain

/ˌɪn.fərˈmeɪ.ʃən ɡeɪn/

noun … “measuring how much a split enlightens.”

Information Gain is a metric used in decision tree learning and other machine learning algorithms to quantify the reduction in uncertainty (entropy) about a target variable after observing a feature. It measures how much knowing the value of a specific predictor improves the prediction of the outcome, guiding the selection of the most informative features when constructing decision trees, such as Decision Trees.

Formally, Information Gain is computed as the difference between the entropy of the original dataset and the weighted sum of entropies of partitions induced by the feature:

IG(Y, X) = H(Y) - Σ P(X = xᵢ)·H(Y | X = xᵢ)

Here, H(Y) represents the entropy of the target variable Y, X is the feature being considered, and P(X = xᵢ) is the probability of the ith value of X. By evaluating Information Gain for all candidate features, the algorithm chooses splits that maximize the reduction in uncertainty, creating a tree that efficiently partitions the data.

Information Gain is closely connected to several core concepts in machine learning and statistics. It relies on Entropy to quantify uncertainty, interacts with Probability Distributions to assess outcome likelihoods, and guides model structure alongside metrics like Gini Impurity. It is particularly critical in algorithms such as ID3, C4.5, and Random Forests, where selecting informative features at each node determines predictive accuracy and tree interpretability.

Example conceptual workflow for calculating Information Gain:

collect dataset with target and predictor variables
compute entropy of the target variable
for each feature, partition dataset by feature values
compute weighted entropy of each partition
subtract weighted entropy from original entropy to get Information Gain
select feature with highest Information Gain for splitting

Intuitively, Information Gain is like shining a spotlight into a dark room: each feature you consider illuminates part of the uncertainty, revealing patterns and distinctions. The more it clarifies, the higher its gain, guiding you toward the clearest path to understanding and predicting outcomes in complex datasets.

Data

Modeling

Processing