GeoParquet

GeoParquet is an open specification that defines how to store vector geographic features — geometries plus attributes — inside Apache Parquet, a column-oriented binary format widely used in data analytics. It adds standardized geo metadata and a geometry column (typically encoded as WKB) so that spatial data fits naturally into modern data-engineering tooling.

Why it matters

Traditional vector formats like Shapefile and GeoPackage are row-oriented and tied to GIS-specific stacks. GeoParquet brings spatial data into the columnar, analytics-first world: it compresses well, supports predicate and column pushdown (read only the columns and row groups you need), and works directly with engines like DuckDB, Apache Arrow, GeoPandas, and cloud query services. For large tables — millions of polygons or points — this is dramatically faster and cheaper than legacy formats.

Concrete example

You can read a remote GeoParquet file with DuckDB's spatial extension and query only matching rows without downloading the whole file:

SELECT name, ST_Area(geometry)
FROM read_parquet('s3://bucket/units.parquet')
WHERE lithology = 'basalt';

In Python, geopandas.read_parquet("units.parquet") round-trips a GeoDataFrame including its CRS, which GeoParquet stores in the file's geo metadata (commonly as a PROJJSON record).

Common pitfall

GeoParquet is optimized for bulk analytics and bulk reads, not for random single-feature edits or transactional updates — for an editable working database use GeoPackage or PostGIS instead. Also confirm the writer version: early files (pre-1.0) may lack the standardized metadata that newer readers expect.

GeoParquet

Why it matters

Concrete example

Common pitfall

Related reading

Need this workflow turned into a reliable terrain or GIS system?