A spatial index is a data structure that organizes geographic features by their location so that spatial queries — "what intersects this area?", "what is nearest to this point?" — can be answered without scanning every feature in the dataset. It is the spatial equivalent of a database index on a column.

What it is and why it matters

Without an index, a query like "find all faults within this map sheet" forces the system to test every geometry one by one. A spatial index narrows the search using a two-phase approach:

  1. Filter step — use each feature's bounding box (minimum bounding rectangle) and the index tree to quickly discard features that cannot possibly match.
  2. Refinement step — run the exact, expensive geometry test only on the small surviving candidate set.

The most common structure is the R-tree, which groups nearby bounding boxes into a hierarchy of nested rectangles. Variants include R*-trees and, in some databases, GiST or grid/quadtree indexes. The payoff is large: queries over millions of features can go from seconds to milliseconds.

Concrete example

In PostGIS, you create a spatial index with:

CREATE INDEX idx_units_geom ON geological_units USING GIST (geom);

After indexing, an ST_Intersects query against a bounding box uses the GiST index automatically. File formats carry indexes too: GeoPackage stores an R-tree, Shapefiles use a sidecar .qix, and FlatGeobuf embeds a packed Hilbert R-tree for fast partial reads.

Common pitfall

Forgetting to build (or rebuild) the index. A freshly loaded PostGIS table has no spatial index until you create one, and queries crawl until you do. After bulk loads, also run ANALYZE so the planner has fresh statistics. Note too that the index works on bounding boxes — it accelerates queries but does not replace the exact geometry test.

Related reading