--- title: Be Civic — Internal Build-Tools Specification type: spec status: v0.1 — extracted from product spec on 2026-05-12 date: 2026-05-12 parent_spec: ./README.md sibling_specs: - architecture.md - protocol.md - schemas.md - privacy.md - lifecycle.md - skills.md - website.md tags: ["be-civic", "bc-internal", "build-tools"] --- # Be Civic — Internal Build-Tools Specification This sub-spec covers internal operations tooling used by the operator to author and maintain the Be Civic corpus. It is distinct from the **product spec** (the other sub-spec docs), which describes the protocol, schemas, lifecycle, and runtime behaviour that customers' agents see. Internal build-tool content lives here because spec changes still need to be coordinated with build-tool changes. When the product spec moves, the build tools must move with it. This file is the bridge: a single place to read what the build tools produce, what artefact shapes they emit, and which product-spec sections they support. The canonical authoring references for `bc-corpus-creator` live at `bc-skills/bc-corpus-creator/references/` (research-report shape, evals shape, voice and style, canonical shape, rubric, schema descriptors). This file documents the **artefact shapes** those references produce and the **product-corpus integration points** (what files ship in `bc-docs/skills//`, how the renderer treats them, how PR-CI gates them). ## 1. Tools in scope | Tool | Role | Source location | Canonical references | |---|---|---|---| | `bc-corpus-creator` | Operator-driven walks of corpus skills end-to-end; produces canonical.md, research-report.md, and optionally evals.json | `bc-skills/bc-corpus-creator/` | `bc-skills/bc-corpus-creator/references/` | Future build tools (catalogue authoring, path-directory enrichment, harness-build helpers, etc.) are added to this table as they emerge. ## 2. Shipped artefacts per skill Each skill in `bc-docs/skills//` ships three artefacts: - **`canonical.md`** — the customer-facing skill body. Product spec: `schemas.md §6.1`. - **`research-report.md`** — the durable evidence record produced by the walking-procedure. Spec: §3 of this document. - **`evals.json`** — the test prompts and per-prompt expectations driving the opt-in benchmark mode. Spec: §4 of this document. The skill-PR classifier path filter at `^skills/[^/]+/(canonical|research-report|evals)\.(md|json)$` (`lifecycle.md §10.1`) covers all three. The renderer surfaces `canonical.md` to customer agents; `research-report.md` and `evals.json` are operator-internal artefacts but ship with the corpus for audit and re-rendering purposes. ## 3. research-report.md schema Every walked skill ships a `research-report.md` alongside its `canonical.md` at `bc-docs/skills//research-report.md`. The research-report is the durable evidence record produced by the walking-procedure (`skills.md §15.1`). It carries the source base, structural decisions, catalogue-extraction proposals, and a §11 failure-modes catalog that the canonical body never fully surfaces. The translator reads this file to render `canonical.md`; the migrator re-reads it on schema-collapse to re-render without re-walking. The artefact is shipped with the corpus and covered by the skill-PR classifier path filter (`lifecycle.md §10.1`) at `^skills/[^/]+/(canonical|research-report|evals)\.(md|json)$`. **Frontmatter.** YAML, all fields present unless marked optional: ```yaml --- title: Research report for type: research-report status: draft | complete walked: YYYY-MM-DD # walk date walker: bc-corpus-creator v # tooling identifier canonical_version: # mirrors `version` on canonical.md # (same artifact; same versioning scheme — §6.1 cohort reset # applies). A research-report backs exactly one canonical # version at a time; canonical-version bumps require either # translator re-render or research-report update + retranslate. spec_round: round- # spec generation this report was structured against # (distinct from the skill `schema_version: 3` integer # field — different namespace, different semantics). # Resolves against # references/schema-descriptors/.md research_complete: true | false # true ⇒ all §3–§8 sources gathered to researcher # self-assessment; false ⇒ reverse-translated draft or # incomplete walk. Migrate refuses to proceed unless true. last_updated: YYYY-MM-DD # bumped on any post-walk research-update; canonical_version # may or may not bump in parallel (research-report can carry # findings not yet promoted into canonical) sources_count: # total §3 rows sources_by_usage: # rolls up the `usage` column on §3 citation: corroboration: failure_mode_context: comparative: volatile_values_proposed: # §4 row count references_proposed: # §5 row count authorities_proposed: # §6 row count failure_modes_documented: # §11 row count new_enum_values: # optional; proposed additions to schemas/types.json enums - : # e.g. sponsor_type:au-treaty-secondment researcher_adequacy_note: | # required; free-form 4–8 sentence self-assessment Named source-class coverage achieved, depth of probe, what was deliberately skipped (and why), confidence level on hand-off to translator. Operator reads this at the Phase 3→4 acceptance gate and decides whether the coverage is sufficient. No hard coverage gate — thoroughness is enforced by researcher.md instructions and operator judgment, not deterministic rubric items. --- ``` **Body sections** (fixed 11-section schema; the migrator depends on this being stable): 1. **Reconnaissance** — skeleton scan, sibling drafts, stop nodes. 2. **Structural decision** — absorption / extraction / stop / deferral, with reasoning. 3. **Sources consulted** — table per source: URL, fetch date, `citation_grade` (statutory / federal / regional / consular / professional / origin / secondary), `usage` enum, quoted passages. **`usage` enum** — distinguishes how each source feeds the corpus, orthogonal to `citation_grade`: - `citation` — cited in canonical body via `` tag. MUST resolve to a citation-grade source (classes 1–6). - `corroboration` — confirms a canonical claim but is NOT cited. Used when a non-canonical-grade source (law-firm post, news coverage, forum thread) independently validates statute or agency guidance. Builds researcher confidence; never surfaces to the user-agent. - `failure-mode-context` — source documents a procedural failure mode captured in §11. Anything from forum thread to news investigation to academic paper. - `comparative` — other-jurisdiction analog or comparison point that informed structural decisions (§2) but isn't load-bearing for this skill's body. Optional; v1 walks may carry zero. The canonical body's `` tags resolve only to `usage: citation` rows; §11 catalog rows draw their `evidence_sources[]` only from `usage: failure-mode-context` rows. The reviewer rubric's Tag-trace item gates on this invariant. 4. **Volatile values surfaced** — table: `name`, `value`, `unit`, `source_ref`, `last_verified`, `indexation_date`. Field set matches the volatile-values catalogue row schema (§6.3). 5. **References surfaced** — table: `name`, `title`, `url`, `last_verified`. Field set matches the references catalogue (§6.10, §6.11). 6. **Authorities surfaced** — table: `id`, `type`, `name`, `url`, `NIS5` (if commune). Field set matches `data/authorities.json`. No free-form "contact pointer" — align to the catalogue. 7. **Catalogue extraction proposals** — new rows vs existing rows referenced. 8. **Type-enum additions** — new `sponsor_type` / `route` / `region` / `relationship_type` values proposed for `schemas/types.json`. 9. **Open questions** — not pinned, needs operator review. Doubles as the **concern-promotion watchlist** post-launch: when a runtime `concern` submission (renamed from `observation` per the 2026-05-15 taxonomy normalization) matches an open question semantically, that is the trigger to promote the resolution into the canonical body and close the entry. 10. **Out of scope** — flagged for adjacent walks. 11. **Common failure modes & procedural pitfalls** — what other sources document about how this procedure goes wrong in practice. Distinct from walker-level pitfalls (which concern the walk, not the procedure). Row schema: - `pattern` — one-line summary of the failure mode (e.g., "Commune rejects birth certificate without apostille on first submission"). - `evidence_sources[]` — pointers to §3 rows carrying `usage: failure-mode-context`. - `severity` — `blocks-procedure` / `delays-procedure` / `cosmetic`. - `observed_by` — `forum` / `news` / `law-firm-blog` / `academic` / `government-report` / `operator-experience`. - `predicted_concern_keywords[]` — phrases an incoming user-agent `concern` submission might use to describe this failure (seeds the future concern-match mechanism for the deferred `promote` mode). Field name renamed from `predicted_observation_keywords[]` per the 2026-05-15 taxonomy normalization. - `canonical_anchor` — pointer to where this failure mode is reflected in the canonical body (e.g., `process#step-4-fee-payment`, `required-documents#birth-certificate`, `known-surprises` per §6.1 (see schemas.md) SG1 reversal). Empty string when the failure mode is not yet surfaced in canonical. The deferred `promote` mode reads this field to know where in the body to insert the surfaced text. - `promotion_state` — `proposed` (not yet validated by a runtime `concern`) / `promoted` (a real concern confirmed; surfaced in canonical) / `deprecated` (promoted but later removed; policy changed) / `contradicted` (new evidence says this failure mode is wrong). **Inclusion rule** (per §15.3 (see skills.md)): a failure mode can be included only if (a) ≥3 independent reports across signal/secondary sources, OR (b) cross-referenced against a primary source. The researcher MUST enforce this; `evidence_sources[]` length is the lower bound on independence (multiple `usage: failure-mode-context` rows must point to different sources, not the same source three times). The catalog is the **watchlist** — rows at `promotion_state: proposed` are candidates for canonical promotion when a real `concern` submission confirms them. **Versioning and lifecycle.** The research-report is a **living document** — `last_updated` bumps on every post-walk research-update mode run; `canonical_version` mirrors the canonical it currently backs. Catalogue rows the researcher proposes flow into D1 via the catalogue-extractor (separate from the research-report file itself); §11 rows live only in the research-report until the deferred `promote` mode surfaces them into canonical. **Translator contract.** The translator reads research-report §1–§10 plus the voice / canonical-shape / schemas references and produces canonical.md. If the schema collapses or the canonical shape changes, the migrator re-reads §1–§10 and re-renders canonical conformant to the new schema, without re-running research. ## 4. evals.json schema `evals.json` is the third shipped artefact per skill, alongside `canonical.md` (§6.1 in schemas.md) and `research-report.md` (§3 of this document). It carries the test prompts and per-prompt expectations that drive the opt-in benchmark mode of the walking-procedure tooling. Authored by the operator (or seeded by the walker on first walk completion), committed at `bc-docs/skills//evals.json`. The skill-PR classifier path filter (§10.1 (see lifecycle.md)) covers it via the amended pattern `^skills/[^/]+/(canonical|research-report|evals)\.(md|json)$`. **Shape:** ```json { "skill_id": "", "evals_version": "", "prompts": [ { "id": "", "prompt": "", "expectations": [ "", "..." ], "tool_call_expectations": [ "", "..." ] } ] } ``` **Field semantics:** - `skill_id` — kebab-case skill id; MUST match the folder name (`skills//evals.json`). - `evals_version` — semver of the evals.json shape. Independent of `canonical_version` and `schema_version`; bumps when the evals harness's contract changes (new field shapes, new expectation types). v1 baseline: `0.1`. - `prompts[]` — array of test prompts. Each prompt is independently runnable; the benchmark mode iterates the array and grades each in isolation. - `prompts[].id` — stable identifier per prompt (kebab-case, deterministic). Allows the benchmark-grader to track per-prompt pass/fail over time and tie regressions to specific scenarios. - `prompts[].prompt` — the user-shaped prompt the runner sends to a fresh subagent loaded with the skill's canonical. Should reflect realistic agent invocations rather than synthetic edge probes. - `prompts[].expectations[]` — LLM-judged assertions the response must satisfy. The benchmark-grader scores each one with pass/fail plus evidence quoted from the response. - `prompts[].tool_call_expectations[]` — LLM-judged assertions about the runner's tool-call pattern (e.g., "should call read_skill('X') at least once", "should NOT redirect mid-flow"). Scored alongside `expectations[]`. **Lifecycle.** The walker seeds evals.json on first walk completion with a minimal baseline prompt set; the operator extends it manually as new edge cases surface. evals.json is versioned with the canonical (same git history; same PR landings). It is NOT a generated artefact — it ships as authored. **Out of scope for v1.** evals.json does not carry runtime metadata (latency budgets, tokens-in/tokens-out targets) — those are runner-side concerns recorded in the benchmark output, not in the eval definitions themselves. Per-prompt baseline-without-canonical comparison is the runner's mode (`baseline: true`); evals.json does not encode the comparison expectation directly. ## 5. Product-spec integration points The build-tool artefacts described above interact with the product spec at the following points. When the product spec changes, the corresponding build-tool reference at `bc-skills/bc-corpus-creator/references/` must be updated to match. | Build-tool artefact | Product-spec reference | Notes | |---|---|---| | `canonical.md` shape | `schemas.md §6.1` (skill schema) | The product spec is authoritative; bc-corpus-creator's `references/canonical-shape.md` is the build-time mirror. | | Walking-procedure steps | `skills.md §15.1` | Manual walks per `skills.md §15.1` remain valid. bc-corpus-creator operationalises the same steps. | | Source classes | `skills.md §15.2` | The closed enum + grade values are in the product spec. The `usage` enum on each source row (citation / corroboration / failure-mode-context / comparative) is internal to research-report and not surfaced in the product corpus. | | Failure-mode inclusion rules | `skills.md §15.3` | research-report §11 carries the catalog; canonical body surfaces what it must. | | Skill-PR classifier path filter | `lifecycle.md §10.1` | Covers `canonical|research-report|evals` artefacts uniformly. | ## 6. Process for build-tool spec updates Changes to research-report.md or evals.json schemas follow the same amendment-proposal workflow as the product spec (`amendment-proposals/README.md`). A change to the build-tool artefact shape MAY require a corresponding update to the product spec when: - The artefact's product-spec integration points (§5 above) shift - The skill-PR classifier path filter needs to change - A new artefact is introduced that ships in `bc-docs/skills//` Build-tool-only changes (rubric refinement, voice-and-style updates, internal agent prompts) do not require product-spec amendments; they are committed directly to `bc-skills/bc-corpus-creator/references/` with operator review.