Validation snapshot

The repository includes runnable benchmarks and research previews with different evidence boundaries. Each row states what the benchmark actually demonstrates and where its evidence stops — these boundaries are deliberate and verification-first.

Benchmark / Preview What it shows Evidence boundary
Information-loss-guided subcatchment partition QGIS-to-Agentic SWMM preprocessing using entropy and fuzzy-similarity concepts from Zhang & Valeo's Journal of Hydrology paper GIS preprocessing concept, not a calibrated SWMM performance claim
Raw GeoPackage-to-INP benchmark Public TUFLOW GeoPackage layers converted into SWMM-ready artifacts, QA, and audit Structured raw GIS path, not arbitrary CAD/GIS recognition
Prepared-input SWMM benchmark External 40-subcatchment Tecnopolo model execution, plotting, and direct swmm5 comparison. v0.7.1 re-verification: model.out SHA256 unchanged across the v0.7.0 → v0.7.1 minor revision, with an 11-word natural-language prompt now sufficient to drive the full run-audit-plot chain end-to-end. Prepared INP validation path
Cross-session memory autonomously activated On a real production natural-language run, the LLM planner consulted prior session history without any user instruction to do so — see the v0.7.1 cross-session memory evidence Memory layer fires correctly and shapes planner decisions on a real natural-language run; staleness weighting and negative-precedent handling are next-milestone scope
Prior Monte Carlo uncertainty smoke Tecnopolo HORTON parameter perturbation and hydrograph envelope preview Prior uncertainty smoke, not calibration
Optional INP-derived raw adapter benchmark Raw-like inputs extracted from a public SWMM fixture and rebuilt through the modular path Adapter handoff check, not greenfield watershed generation
Information-loss-guided subcatchment partition using entropy and fuzzy-similarity preprocessing
Information-loss-guided subcatchment partition — entropy and fuzzy-similarity preprocessing of QGIS layers into SWMM-ready subcatchments. A GIS preprocessing concept, not a calibrated SWMM performance claim.
Tecnopolo Monte Carlo hydrograph uncertainty envelope from HORTON parameter perturbation
Tecnopolo Monte Carlo hydrograph uncertainty envelope — HORTON parameter perturbation producing a flow envelope preview. A prior uncertainty smoke, not calibration.
Tecnopolo rainfall-runoff plot at node J22 for the January 1994 event
Tecnopolo rainfall-runoff — rainfall and runoff at node J22 for the prepared 40-subcatchment Tecnopolo model along the prepared INP validation path.

Audit and research memory

The audit layer consolidates artifacts, QA checks, and metric provenance into an Obsidian-compatible experiment note. This example catches a recorded peak-flow value that does not match the value re-parsed from the SWMM report source section.

Experiment audit comparison showing a peak-flow provenance mismatch
Experiment audit comparison — a peak-flow provenance mismatch surfaced when the recorded value is re-parsed against the SWMM report source section.

The downstream modelling-memory layer can summarize audited run histories into recurring failure patterns, assumptions, missing evidence, QA issues, lessons learned, and controlled proposals for updating existing skills or creating new skills. Because skills drive the workflow, these proposals stay coupled to the current Agentic SWMM framework and still require human review and benchmark verification before acceptance.