B-tree | CΛTΞИCOΔΞ

/biː-triː/

noun — "balanced tree for efficient data retrieval."

B-tree is a self-balancing tree data structure commonly used in databases and file systems to maintain sorted data and enable efficient insertion, deletion, and search operations. It is designed to minimize disk access and optimize storage of large datasets by keeping nodes partially filled, thereby reducing the height of the tree and the number of I/O operations required to locate an element. B-trees are a cornerstone in indexing, providing logarithmic-time complexity for lookups, inserts, and deletions even in massive datasets.

Technically, a B-tree of order m has nodes that may contain up to m–1 keys and m children. All leaf nodes reside at the same depth, ensuring balanced structure. Internal nodes store keys that act as separation values, guiding searches toward the correct subtree. When a node exceeds its capacity, it splits, propagating keys upward; conversely, underflow during deletion may trigger merging or redistribution to maintain balance. This structure allows B-trees to handle dynamic datasets efficiently, making them ideal for database indexes and file system directories where read/write operations must be optimized.

In workflow terms, consider a relational database using a B-tree to index a customer table on the customer_id column. When a new customer is added, the B-tree ensures the customer_id is inserted at the correct position while maintaining balanced nodes. When querying a customer, the tree guides the search through internal nodes, locating the record with minimal disk accesses, even if the table contains millions of entries.

For a simplified code example demonstrating a B-tree search in pseudocode:

function searchBTree(node, key):
    if node is null:
        return null
    for i in 0..node.numKeys:
        if key == node.keys[i]:
            return node.values[i]
        else if key < node.keys[i]:
            return searchBTree(node.children[i], key)
    return searchBTree(node.children[node.numKeys], key)

This illustrates how a B-tree navigates through internal nodes and children to locate the desired key efficiently without scanning the entire dataset.

Advanced variations include B+ trees, which store all actual data in leaf nodes and maintain internal nodes as keys for routing, and B* trees, which optimize node utilization and reduce splits. B-trees and their derivatives underpin database indexing strategies, file systems like NTFS and ext4, and key-value storage engines, enabling high-performance retrieval and updates on large datasets.

Conceptually, a B-tree is like a multi-level library index: each shelf lists references that direct the reader to the next level, ultimately reaching the exact book with minimal walking between shelves.

See Index, Database, B+ tree.

Structure

Organization

Database