Gradient Boosting

/ˈɡreɪ.di.ənt ˈbuː.stɪŋ/

noun … “learning from mistakes, one step at a time.”

Gradient Boosting is an ensemble machine learning technique that builds predictive models sequentially, where each new model attempts to correct the errors of the previous models. It combines the strengths of multiple weak learners, typically Decision Trees, into a strong learner by optimizing a differentiable loss function using gradient descent. This approach allows Gradient Boosting to achieve high accuracy in regression and classification tasks while capturing complex patterns in the data.

Mathematically, given a loss function L(y, F(x)) for predictions F(x) and true outcomes y, Gradient Boosting iteratively fits a new model hₘ(x) to the negative gradient of the loss function with respect to the current ensemble prediction:

F₀(x) = initial guess
for m = 1 to M:
    compute pseudo-residuals rᵢₘ = - [∂L(yᵢ, F(xᵢ)) / ∂F(xᵢ)]
    fit weak learner hₘ(x) to rᵢₘ
    update Fₘ(x) = Fₘ₋₁(x) + η·hₘ(x)

Here, η is the learning rate controlling the contribution of each new tree, and M is the number of boosting iterations. By sequentially addressing residual errors, the ensemble converges toward a model that minimizes the overall loss.

Gradient Boosting is closely connected to several core concepts in machine learning. It uses Decision Trees as base learners, relies on residuals and Variance reduction to refine predictions, and can incorporate regularization techniques to prevent overfitting. It also complements ensemble methods like Random Forest, though boosting focuses on sequential error correction, whereas Random Forest emphasizes parallel aggregation.

Example conceptual workflow for Gradient Boosting:

collect dataset with predictors and target
initialize model with a simple guess for F₀(x)
compute residuals from current model
fit a weak learner (e.g., small Decision Tree) to residuals
update ensemble prediction with learning rate η
repeat for M iterations until residuals are minimized
evaluate final ensemble model performance

Intuitively, Gradient Boosting is like climbing a hill blindfolded using only local slope information: each step (tree) corrects the errors of the last, gradually approaching the top (optimal prediction). It turns sequential improvement into a powerful method for modeling complex and nuanced datasets.

Modeling

Data

Compute