Skip to content

Generating Validation Reports

validation/generate_report.py parses City2TABULA validation output folders and produces a single paper-ready markdown report. It automatically finds the latest run in each folder and extracts four sections: accuracy, geometry quality, error stratification, and building height analysis.


When to Use It

After running city2tabula -extract-features and the validation notebook on one or more datasets, use this script to turn the raw CSVs into a formatted document you can copy directly into a paper or archive alongside the results.


Input: Validation Output Structure

The script expects the folder layout produced by the validation notebook:

outputs/
└── Germany/
    └── c2t_deggendorf/
        └── validation_20260528_190945/   ← latest run picked automatically
            ├── building_summary.csv
            ├── building_validation.csv
            ├── roof_summary.csv
            ├── roof_validation.csv
            ├── wall_summary.csv
            ├── wall_validation.csv
            ├── floor_summary.csv
            ├── floor_validation.csv
            ├── problematic_roofs.csv
            ├── problematic_walls.csv
            └── plots/

If multiple validation_* subfolders exist, the lexicographically latest one (i.e. the most recent timestamp) is used.


Usage

Single dataset

python validation/generate_report.py validation/outputs/Germany/c2t_deggendorf

Output: validation/outputs/Germany/c2t_deggendorf/report.md

Multiple datasets — combined report

python validation/generate_report.py \
  --dataset "Freiburg (DE):validation/outputs/Germany/c2t_freiburg" \
  --dataset "Vienna (AT):validation/outputs/Austria/c2t_vienna_130k" \
  --dataset "Deggendorf (DE):validation/outputs/Germany/c2t_deggendorf" \
  --output validation/outputs/combined_report.md

Labels can be any string; they appear as the Dataset column in all tables.

Custom output path

python validation/generate_report.py \
  --dataset "Deggendorf (DE):validation/outputs/Germany/c2t_deggendorf" \
  --output my_report.md

Output Sections

1. Accuracy

RMSE and mean signed difference for every validated attribute across all datasets.

Dataset Level Attribute n RMSE Mean diff Median %err
Freiburg (DE) Roof Surface area (m²) 22,411 0.392 −0.004 0.000
Freiburg (DE) Wall Surface area (m²) 55,129 0.004 0.000 0.000
Vienna (AT) Wall Surface area (m²) 1,180,291 17.360 −0.030 0.000
...

2. Geometry Quality

Per surface type: how many surfaces are PostGIS-invalid (ST_IsValid = false) or non-planar. Counted on unique surfaces (area attribute only, to avoid double-counting tilt/azimuth rows).

Dataset Surface type n surfaces Invalid (%) Non-planar (%)
Freiburg (DE) Roof 22,411 0.0% 52.2%
Freiburg (DE) Wall 55,129 89.0% 13.1%
Vienna (AT) Wall 1,180,291 99.4% 17.2%
...

3. RMSE by Geometry Validity

Surface area RMSE split into two groups: surfaces that passed both is_valid and is_planar, and surfaces that failed either check. This isolates the contribution of degenerate geometry to overall error.

Dataset Surface type RMSE all RMSE valid+planar RMSE invalid/non-planar n valid n invalid
Deggendorf (DE) Wall 38.177 0.016 38.256 3,482 840,307
Vienna (AT) Wall 17.360 0.003 17.362 240 1,180,051
...

The geometry argument

The Key Takeaways section at the end of the report computes the ratio of invalid/non-planar RMSE to valid+planar RMSE automatically. For example:

Wall area: RMSE for invalid/non-planar surfaces (38.256 m², n=840,307) is 2,431× higher than valid+planar surfaces (0.016 m², n=3,482).

This is the quantitative argument that geometry quality — not the pipeline — drives the remaining error.

4. Building Height by Attachment Status

Height RMSE split by has_attached_neighbour. Buildings that share a wall with a neighbour can have their roof surfaces mis-attributed during geometry-based surface assignment, inflating height errors.

Note

This section only appears when has_attached_neighbour is present in building_validation.csv. Run flag_attached_buildings.py before the validation notebook to populate this column.


Example: Full Three-Dataset Run

cd city2tabula

python validation/generate_report.py \
  --dataset "Freiburg (DE):validation/outputs/Germany/c2t_freiburg" \
  --dataset "Vienna (AT):validation/outputs/Austria/c2t_vienna_130k" \
  --dataset "Deggendorf (DE):validation/outputs/Germany/c2t_deggendorf" \
  --output validation/outputs/combined_report.md

# Report written to: validation/outputs/combined_report.md

Open combined_report.md — all four tables are ready to copy into the paper. The Key Takeaways prose sentences can be used directly in the Discussion section.


What the Script Does Not Do

  • It does not re-run validation — it reads existing CSV outputs only.
  • It does not generate plots — the individual per-dataset plots are already in each plots/ subfolder.
  • It does not modify any source data or database.