Generating Validation Reports¶

validation/generate_report.py parses City2TABULA validation output folders and produces a single paper-ready markdown report. It automatically finds the latest run in each folder and extracts four sections: accuracy, geometry quality, error stratification, and building height analysis.

When to Use It¶

After running city2tabula -extract-features and the validation notebook on one or more datasets, use this script to turn the raw CSVs into a formatted document you can copy directly into a paper or archive alongside the results.

Input: Validation Output Structure¶

The script expects the folder layout produced by the validation notebook:

outputs/
└── Germany/
    └── c2t_deggendorf/
        └── validation_20260528_190945/   ← latest run picked automatically
            ├── building_summary.csv
            ├── building_validation.csv
            ├── roof_summary.csv
            ├── roof_validation.csv
            ├── wall_summary.csv
            ├── wall_validation.csv
            ├── floor_summary.csv
            ├── floor_validation.csv
            ├── problematic_roofs.csv
            ├── problematic_walls.csv
            └── plots/

If multiple validation_* subfolders exist, the lexicographically latest one (i.e. the most recent timestamp) is used.

Usage¶

Single dataset¶

python validation/generate_report.py validation/outputs/Germany/c2t_deggendorf

Output: validation/outputs/Germany/c2t_deggendorf/report.md

Multiple datasets — combined report¶

python validation/generate_report.py \
  --dataset "Freiburg (DE):validation/outputs/Germany/c2t_freiburg" \
  --dataset "Vienna (AT):validation/outputs/Austria/c2t_vienna_130k" \
  --dataset "Deggendorf (DE):validation/outputs/Germany/c2t_deggendorf" \
  --output validation/outputs/combined_report.md

Labels can be any string; they appear as the Dataset column in all tables.

Custom output path¶

python validation/generate_report.py \
  --dataset "Deggendorf (DE):validation/outputs/Germany/c2t_deggendorf" \
  --output my_report.md

Output Sections¶

1. Accuracy¶

RMSE and mean signed difference for every validated attribute across all datasets.

Dataset	Level	Attribute	n	RMSE	Mean diff	Median %err
Freiburg (DE)	Roof	Surface area (m²)	22,411	0.392	−0.004	0.000
Freiburg (DE)	Wall	Surface area (m²)	55,129	0.004	0.000	0.000
Vienna (AT)	Wall	Surface area (m²)	1,180,291	17.360	−0.030	0.000
...

2. Geometry Quality¶

Per surface type: how many surfaces are PostGIS-invalid (ST_IsValid = false) or non-planar. Counted on unique surfaces (area attribute only, to avoid double-counting tilt/azimuth rows).

Dataset	Surface type	n surfaces	Invalid (%)	Non-planar (%)
Freiburg (DE)	Roof	22,411	0.0%	52.2%
Freiburg (DE)	Wall	55,129	89.0%	13.1%
Vienna (AT)	Wall	1,180,291	99.4%	17.2%
...

3. RMSE by Geometry Validity¶

Surface area RMSE split into two groups: surfaces that passed both is_valid and is_planar, and surfaces that failed either check. This isolates the contribution of degenerate geometry to overall error.

Dataset	Surface type	RMSE all	RMSE valid+planar	RMSE invalid/non-planar	n valid	n invalid
Deggendorf (DE)	Wall	38.177	0.016	38.256	3,482	840,307
Vienna (AT)	Wall	17.360	0.003	17.362	240	1,180,051
...

The geometry argument

The Key Takeaways section at the end of the report computes the ratio of invalid/non-planar RMSE to valid+planar RMSE automatically. For example:

Wall area: RMSE for invalid/non-planar surfaces (38.256 m², n=840,307) is 2,431× higher than valid+planar surfaces (0.016 m², n=3,482).

This is the quantitative argument that geometry quality — not the pipeline — drives the remaining error.

4. Building Height by Attachment Status¶

Height RMSE split by has_attached_neighbour. Buildings that share a wall with a neighbour can have their roof surfaces mis-attributed during geometry-based surface assignment, inflating height errors.

Note

This section only appears when has_attached_neighbour is present in building_validation.csv. Run flag_attached_buildings.py before the validation notebook to populate this column.

Example: Full Three-Dataset Run¶

cd city2tabula

python validation/generate_report.py \
  --dataset "Freiburg (DE):validation/outputs/Germany/c2t_freiburg" \
  --dataset "Vienna (AT):validation/outputs/Austria/c2t_vienna_130k" \
  --dataset "Deggendorf (DE):validation/outputs/Germany/c2t_deggendorf" \
  --output validation/outputs/combined_report.md

# Report written to: validation/outputs/combined_report.md

Open combined_report.md — all four tables are ready to copy into the paper. The Key Takeaways prose sentences can be used directly in the Discussion section.

What the Script Does Not Do¶

It does not re-run validation — it reads existing CSV outputs only.
It does not generate plots — the individual per-dataset plots are already in each plots/ subfolder.
It does not modify any source data or database.