/ɛnˈtrəpi/

noun … “measuring uncertainty in a single number.”

Entropy is a fundamental concept in information theory, probability, and thermodynamics that quantifies the uncertainty, disorder, or information content in a system or random variable. In the context of information theory, introduced by Claude Shannon, entropy measures the average amount of information produced by a stochastic source of data. Higher entropy corresponds to greater unpredictability, while lower entropy indicates more certainty or redundancy.

For a discrete random variable X with possible outcomes {x₁, x₂, ..., xₙ} and probability distribution P(X), the Shannon entropy is defined as:

H(X) = - Σ P(xᵢ) log₂ P(xᵢ)

Here, P(xᵢ) is the probability of outcome xᵢ, and the logarithm is typically base 2, giving entropy in bits. Entropy provides a foundation for understanding coding efficiency, data compression, and uncertainty reduction in algorithms such as Decision Trees, where metrics like Information Gain rely on entropy to determine optimal splits.

Entropy is closely related to several key concepts. It leverages Probability Distributions to quantify uncertainty, interacts with Expectation Values to assess average information content, and connects to Variance when evaluating dispersion in probabilistic systems. In machine learning, entropy informs feature selection, decision-making under uncertainty, and regularization methods. Beyond information theory, it has analogues in physics as a measure of disorder and in cryptography as a measure of randomness in keys or outputs.

Example conceptual workflow for applying entropy in a dataset:

identify the target variable with multiple possible outcomes
compute probability distribution P(X) of outcomes
apply Shannon entropy formula H(X) = -Σ P(xᵢ) log₂ P(xᵢ)
use computed entropy to measure uncertainty, guide feature selection, or calculate Information Gain
interpret high entropy as high unpredictability and low entropy as concentrated or predictable patterns

Intuitively, Entropy is like counting how many yes/no questions you would need on average to guess the outcome of a random event. It captures the essence of uncertainty in a single number, providing a compass for decision-making, data compression, and understanding the flow of information in complex systems.