SQL Extraction Pipeline¶
This section documents the eight SQL scripts that transform raw 3D building geometry from the CityDB database into a structured set of building features ready for TABULA classification.
The scripts run in order, numbered 01_ through 08_. Each script reads from the output of the previous one, so the pipeline is strictly sequential within a batch of buildings.
What it does¶
Given a batch of building IDs, the pipeline:
- Finds all surface polygons that belong to each building (roof, wall, ground faces).
- Flattens compound multi-polygon geometries into individual polygon faces.
- Computes a surface normal for each face, then derives tilt, azimuth, area, and height.
- Aggregates those per-surface values into a single summary row per building.
- Approximates building volume from height × footprint area.
- Refines the storey count and total floor area.
- Matches each building to its closest TABULA archetype using nearest-neighbour search in feature space.
- Builds the resolved surface output table, excluding party-wall surfaces.
Data flow¶
flowchart TD
A[("CityDB<br>(lod2 / lod3 schema)")] -->|script 01| B["_child_feature<br>One row per surface feature"]
B -->|script 02| C["_child_feature_geom_dump<br>One row per polygon face"]
C -->|script 03| D["_surface_raw<br>Tilt, azimuth, area, height per face"]
D -->|script 04| E["_building<br>Aggregated building summary"]
E -->|script 05| F["_building<br>Volume added"]
F -->|script 06| G["_building<br>Storeys + floor area refined"]
G -->|script 07| H["_building<br>TABULA variant code assigned"]
I[("tabula.tabula_variant<br>(reference archetypes)")] -->|script 07| H
D -->|script 08| J["_surface<br>Resolved surface output<br>geometry + attributes"]
Script reference¶
| Script | Purpose | Writes to |
|---|---|---|
| 01 — Get child features | Spatially match surface features to each building solid | _child_feature |
| 02 — Dump geometry | Explode multi-polygon surfaces to individual polygon faces | _child_feature_geom_dump |
| 03 — Surface attributes | Compute surface normal, tilt, azimuth, area, and height per face | _surface_raw |
| 04 — Building features | Aggregate surface attributes into one row per building | _building |
| 05 — Volume | Approximate building volume from height × footprint | _building (UPDATE) |
| 06 — Storeys | Refine storey count; overwrite floor area as footprint × storeys | _building (UPDATE) |
| 07 — TABULA labelling | Nearest-neighbour match to closest TABULA archetype | _building (UPDATE) |
| 08 — Build surface | Populate resolved surface output with geometry and attributes | _surface |
Key concepts¶
LoD (Level of Detail)¶
CityGML defines several levels of geometric detail for buildings. City2TABULA works with LoD2 (simple roof shapes, no interior) and LoD3 (detailed facades). Each LoD is stored in its own database schema (lod2 or lod3); the scripts use {lod_schema} as a placeholder that is substituted at runtime.
Surface types¶
Each building solid is decomposed into typed surface features:
| CityGML type | Meaning |
|---|---|
RoofSurface |
The roof polygon(s) |
WallSurface |
Facade/wall polygon(s) |
GroundSurface |
The building footprint polygon(s) on the ground |
Scripts 03 and 04 treat these three types differently — for example, tilt and azimuth are computed differently for roofs versus walls, and the footprint area comes from GroundSurface only.
Idempotency¶
Every INSERT script checks whether the building has already been processed and skips it if so. This means the pipeline is safe to re-run against a partially-processed batch without creating duplicate rows.
SQL templating¶
Scripts contain {placeholder} tokens that are resolved before execution:
| Placeholder | Example value |
|---|---|
{lod_schema} |
lod2 |
{lod_level} |
2 |
{city2tabula_schema} |
city2tabula |
{building_ids} |
(1, 2, 3, ...) |
{srid} |
25832 |