/ˈmæksɪməm ˈlaɪk.li.hʊd ˌɛstɪˈmeɪʃən/

noun … “finding the parameters that make your data most believable.”

Maximum Likelihood Estimation (MLE) is a statistical method for estimating the parameters of a probabilistic model by maximizing the likelihood that the observed data were generated under those parameters. In essence, MLE chooses parameter values that make the observed outcomes most probable, providing a principled foundation for parameter inference in a wide range of models, from simple distributions like Probability Distributions to complex regression and machine learning frameworks.

Formally, given data X = {x₁, x₂, ..., xₙ} and a likelihood function L(θ | X) depending on parameters θ, MLE finds:

θ̂ = argmax_θ L(θ | X) = argmax_θ Π f(xᵢ | θ)

where f(xᵢ | θ) is the probability density or mass function of observation xᵢ given parameters θ. In practice, the log-likelihood log L(θ | X) is often maximized instead for numerical stability and simplicity. MLE provides estimates that are consistent, asymptotically normal, and efficient under standard regularity conditions.

Maximum Likelihood Estimation is deeply connected to numerous concepts in statistics and machine learning. It leverages Expectation Values to compute expected outcomes, interacts with Variance to assess estimator precision, and underpins models like Logistic Regression, Linear Regression, and probabilistic generative models including Naive Bayes. It also forms the basis for advanced methods such as Gradient Descent when maximizing complex likelihoods numerically.

Example conceptual workflow for MLE:

collect observed dataset X
define a parametric model with unknown parameters θ
construct the likelihood function L(θ | X) based on model
compute the log-likelihood for numerical stability
maximize log-likelihood analytically or numerically to obtain θ̂
evaluate estimator properties and confidence intervals

Intuitively, Maximum Likelihood Estimation is like tuning the knobs of a probabilistic machine to make the observed data as likely as possible: each parameter adjustment increases the plausibility of what actually happened, guiding you toward the most reasonable explanation consistent with the evidence. It transforms raw data into informed, optimal parameter estimates, giving structure to uncertainty.