Index | CΛTΞИCOΔΞ

/ˈɪn.deks/

noun — "data structure for fast lookup."

Index is a specialized data structure used in computing and database systems to improve the speed and efficiency of data retrieval operations. It functions as a roadmap or table of contents, allowing a system to quickly locate the position of a desired item without scanning the entire dataset. Indexes are essential in relational and non-relational databases, search engines, file systems, and large-scale storage systems, where rapid access to specific records is critical.

Technically, an index stores key-value pairs or references that map a search key (such as a column value) to the physical location of the corresponding data. Common implementations include B-trees, B+ trees, hash tables, inverted indexes, and bitmaps, each optimized for different query types, data distributions, and performance characteristics. Indexes may be clustered, where the data rows are physically ordered according to the index, or non-clustered, where the index maintains separate pointers to the data. Multiple indexes can coexist on a dataset to support diverse access patterns.

In workflow terms, consider a relational database storing millions of customer records. Without an index on the email field, a query searching for a specific email would require a full table scan, checking each row sequentially. By creating an index on the email column, the database engine can quickly locate the desired row using the index structure, dramatically reducing query latency. Similarly, search engines build inverted indexes that map keywords to document locations, enabling rapid retrieval of relevant pages in response to user queries.

For a concrete example, a simple index on an array of integers in code could be represented as:

data = [34, 7, 23, 32, 5, 62]
index = { value: position for position, value in enumerate(data) }
# index lookup
position_of_23 = index[23]  # returns 2

This demonstrates how an index allows immediate access to the position of a value without scanning the entire array.

Indexes also store auxiliary information, such as minimum and maximum values, counts, or aggregated statistics, to accelerate query operations. Maintaining indexes incurs storage and update overhead, as each insertion, deletion, or modification of data requires updating the corresponding indexes. Database designers balance read performance against write overhead, selecting indexes carefully based on workload patterns.

Advanced indexing strategies include partial indexes, covering indexes, multi-column indexes, and spatial indexes for specialized data types like geolocation coordinates. In distributed databases and data warehouses, indexes support query planners in generating efficient execution strategies, while replication and partitioning ensure durability and availability.

Conceptually, an index is like a library catalog: instead of scanning every book on the shelves, a reader consults the catalog to immediately locate the desired book by author, title, or subject, enabling rapid and precise access to information.

See Database, Query, B-tree.

Structure

Data

Database