---
title: Be Civic — Internal Build-Tools Specification
type: spec
status: v0.1 — extracted from product spec on 2026-05-12
date: 2026-05-12
parent_spec: ./README.md
sibling_specs:
  - architecture.md
  - protocol.md
  - schemas.md
  - privacy.md
  - lifecycle.md
  - skills.md
  - website.md
tags: ["be-civic", "bc-internal", "build-tools"]
---

# Be Civic — Internal Build-Tools Specification

This sub-spec covers internal operations tooling used by the operator to author and maintain the Be Civic corpus. It is distinct from the **product spec** (the other sub-spec docs), which describes the protocol, schemas, lifecycle, and runtime behaviour that customers' agents see.

Internal build-tool content lives here because spec changes still need to be coordinated with build-tool changes. When the product spec moves, the build tools must move with it. This file is the bridge: a single place to read what the build tools produce, what artefact shapes they emit, and which product-spec sections they support.

The canonical authoring references for `bc-corpus-creator` live at `bc-skills/bc-corpus-creator/references/` (research-report shape, evals shape, voice and style, canonical shape, rubric, schema descriptors). This file documents the **artefact shapes** those references produce and the **product-corpus integration points** (what files ship in `bc-docs/skills/<id>/`, how the renderer treats them, how PR-CI gates them).

## 1. Tools in scope

| Tool | Role | Source location | Canonical references |
|---|---|---|---|
| `bc-corpus-creator` | Operator-driven walks of corpus skills end-to-end; produces canonical.md, research-report.md, and optionally evals.json | `bc-skills/bc-corpus-creator/` | `bc-skills/bc-corpus-creator/references/` |

Future build tools (catalogue authoring, path-directory enrichment, harness-build helpers, etc.) are added to this table as they emerge.

## 2. Shipped artefacts per skill

Each skill in `bc-docs/skills/<id>/` ships three artefacts:

- **`canonical.md`** — the customer-facing skill body. Product spec: `schemas.md §6.1`.
- **`research-report.md`** — the durable evidence record produced by the walking-procedure. Spec: §3 of this document.
- **`evals.json`** — the test prompts and per-prompt expectations driving the opt-in benchmark mode. Spec: §4 of this document.

The skill-PR classifier path filter at `^skills/[^/]+/(canonical|research-report|evals)\.(md|json)$` (`lifecycle.md §10.1`) covers all three. The renderer surfaces `canonical.md` to customer agents; `research-report.md` and `evals.json` are operator-internal artefacts but ship with the corpus for audit and re-rendering purposes.

## 3. research-report.md schema

Every walked skill ships a `research-report.md` alongside its `canonical.md` at `bc-docs/skills/<id>/research-report.md`. The research-report is the durable evidence record produced by the walking-procedure (`skills.md §15.1`). It carries the source base, structural decisions, catalogue-extraction proposals, and a §11 failure-modes catalog that the canonical body never fully surfaces. The translator reads this file to render `canonical.md`; the migrator re-reads it on schema-collapse to re-render without re-walking. The artefact is shipped with the corpus and covered by the skill-PR classifier path filter (`lifecycle.md §10.1`) at `^skills/[^/]+/(canonical|research-report|evals)\.(md|json)$`.

**Frontmatter.** YAML, all fields present unless marked optional:

```yaml
---
title: Research report for <skill-id>
type: research-report
status: draft | complete
walked: YYYY-MM-DD                         # walk date
walker: bc-corpus-creator v<semver>        # tooling identifier
canonical_version: <semver>                # mirrors `version` on canonical.md
                                           # (same artifact; same versioning scheme — §6.1 cohort reset
                                           # applies). A research-report backs exactly one canonical
                                           # version at a time; canonical-version bumps require either
                                           # translator re-render or research-report update + retranslate.
spec_round: round-<N>                      # spec generation this report was structured against
                                           # (distinct from the skill `schema_version: 3` integer
                                           # field — different namespace, different semantics).
                                           # Resolves against
                                           # references/schema-descriptors/<spec_round>.md
research_complete: true | false            # true ⇒ all §3–§8 sources gathered to researcher
                                           # self-assessment; false ⇒ reverse-translated draft or
                                           # incomplete walk. Migrate refuses to proceed unless true.
last_updated: YYYY-MM-DD                   # bumped on any post-walk research-update; canonical_version
                                           # may or may not bump in parallel (research-report can carry
                                           # findings not yet promoted into canonical)
sources_count: <integer>                   # total §3 rows
sources_by_usage:                          # rolls up the `usage` column on §3
  citation: <integer>
  corroboration: <integer>
  failure_mode_context: <integer>
  comparative: <integer>
volatile_values_proposed: <integer>        # §4 row count
references_proposed: <integer>             # §5 row count
authorities_proposed: <integer>            # §6 row count
failure_modes_documented: <integer>        # §11 row count
new_enum_values:                           # optional; proposed additions to schemas/types.json enums
  - <enum>:<value>                         # e.g. sponsor_type:au-treaty-secondment
researcher_adequacy_note: |                # required; free-form 4–8 sentence self-assessment
  Named source-class coverage achieved, depth of probe, what was deliberately
  skipped (and why), confidence level on hand-off to translator. Operator
  reads this at the Phase 3→4 acceptance gate and decides whether the
  coverage is sufficient. No hard coverage gate — thoroughness is enforced
  by researcher.md instructions and operator judgment, not deterministic
  rubric items.
---
```

**Body sections** (fixed 11-section schema; the migrator depends on this being stable):

1. **Reconnaissance** — skeleton scan, sibling drafts, stop nodes.
2. **Structural decision** — absorption / extraction / stop / deferral, with reasoning.
3. **Sources consulted** — table per source: URL, fetch date, `citation_grade` (statutory / federal / regional / consular / professional / origin / secondary), `usage` enum, quoted passages.

   **`usage` enum** — distinguishes how each source feeds the corpus, orthogonal to `citation_grade`:
   - `citation` — cited in canonical body via `<Ref>` tag. MUST resolve to a citation-grade source (classes 1–6).
   - `corroboration` — confirms a canonical claim but is NOT cited. Used when a non-canonical-grade source (law-firm post, news coverage, forum thread) independently validates statute or agency guidance. Builds researcher confidence; never surfaces to the user-agent.
   - `failure-mode-context` — source documents a procedural failure mode captured in §11. Anything from forum thread to news investigation to academic paper.
   - `comparative` — other-jurisdiction analog or comparison point that informed structural decisions (§2) but isn't load-bearing for this skill's body. Optional; v1 walks may carry zero.

   The canonical body's `<Ref>` tags resolve only to `usage: citation` rows; §11 catalog rows draw their `evidence_sources[]` only from `usage: failure-mode-context` rows. The reviewer rubric's Tag-trace item gates on this invariant.
4. **Volatile values surfaced** — table: `name`, `value`, `unit`, `source_ref`, `last_verified`, `indexation_date`. Field set matches the volatile-values catalogue row schema (§6.3).
5. **References surfaced** — table: `name`, `title`, `url`, `last_verified`. Field set matches the references catalogue (§6.10, §6.11).
6. **Authorities surfaced** — table: `id`, `type`, `name`, `url`, `NIS5` (if commune). Field set matches `data/authorities.json`. No free-form "contact pointer" — align to the catalogue.
7. **Catalogue extraction proposals** — new rows vs existing rows referenced.
8. **Type-enum additions** — new `sponsor_type` / `route` / `region` / `relationship_type` values proposed for `schemas/types.json`.
9. **Open questions** — not pinned, needs operator review. Doubles as the **concern-promotion watchlist** post-launch: when a runtime `concern` submission (renamed from `observation` per the 2026-05-15 taxonomy normalization) matches an open question semantically, that is the trigger to promote the resolution into the canonical body and close the entry.
10. **Out of scope** — flagged for adjacent walks.
11. **Common failure modes & procedural pitfalls** — what other sources document about how this procedure goes wrong in practice. Distinct from walker-level pitfalls (which concern the walk, not the procedure). Row schema:
    - `pattern` — one-line summary of the failure mode (e.g., "Commune rejects birth certificate without apostille on first submission").
    - `evidence_sources[]` — pointers to §3 rows carrying `usage: failure-mode-context`.
    - `severity` — `blocks-procedure` / `delays-procedure` / `cosmetic`.
    - `observed_by` — `forum` / `news` / `law-firm-blog` / `academic` / `government-report` / `operator-experience`.
    - `predicted_concern_keywords[]` — phrases an incoming user-agent `concern` submission might use to describe this failure (seeds the future concern-match mechanism for the deferred `promote` mode). Field name renamed from `predicted_observation_keywords[]` per the 2026-05-15 taxonomy normalization.
    - `canonical_anchor` — pointer to where this failure mode is reflected in the canonical body (e.g., `process#step-4-fee-payment`, `required-documents#birth-certificate`, `known-surprises` per §6.1 (see schemas.md) SG1 reversal). Empty string when the failure mode is not yet surfaced in canonical. The deferred `promote` mode reads this field to know where in the body to insert the surfaced text.
    - `promotion_state` — `proposed` (not yet validated by a runtime `concern`) / `promoted` (a real concern confirmed; surfaced in canonical) / `deprecated` (promoted but later removed; policy changed) / `contradicted` (new evidence says this failure mode is wrong).

    **Inclusion rule** (per §15.3 (see skills.md)): a failure mode can be included only if (a) ≥3 independent reports across signal/secondary sources, OR (b) cross-referenced against a primary source. The researcher MUST enforce this; `evidence_sources[]` length is the lower bound on independence (multiple `usage: failure-mode-context` rows must point to different sources, not the same source three times).

    The catalog is the **watchlist** — rows at `promotion_state: proposed` are candidates for canonical promotion when a real `concern` submission confirms them.

**Versioning and lifecycle.** The research-report is a **living document** — `last_updated` bumps on every post-walk research-update mode run; `canonical_version` mirrors the canonical it currently backs. Catalogue rows the researcher proposes flow into D1 via the catalogue-extractor (separate from the research-report file itself); §11 rows live only in the research-report until the deferred `promote` mode surfaces them into canonical.

**Translator contract.** The translator reads research-report §1–§10 plus the voice / canonical-shape / schemas references and produces canonical.md. If the schema collapses or the canonical shape changes, the migrator re-reads §1–§10 and re-renders canonical conformant to the new schema, without re-running research.

## 4. evals.json schema

`evals.json` is the third shipped artefact per skill, alongside `canonical.md` (§6.1 in schemas.md) and `research-report.md` (§3 of this document). It carries the test prompts and per-prompt expectations that drive the opt-in benchmark mode of the walking-procedure tooling. Authored by the operator (or seeded by the walker on first walk completion), committed at `bc-docs/skills/<id>/evals.json`. The skill-PR classifier path filter (§10.1 (see lifecycle.md)) covers it via the amended pattern `^skills/[^/]+/(canonical|research-report|evals)\.(md|json)$`.

**Shape:**

```json
{
  "skill_id": "<kebab-case-id>",
  "evals_version": "<semver>",
  "prompts": [
    {
      "id": "<prompt-id>",
      "prompt": "<user-shaped prompt the runner sends to a fresh subagent loaded with the canonical>",
      "expectations": [
        "<LLM-judged assertion the response must satisfy>",
        "..."
      ],
      "tool_call_expectations": [
        "<LLM-judged assertion about the runner's tool-call pattern>",
        "..."
      ]
    }
  ]
}
```

**Field semantics:**

- `skill_id` — kebab-case skill id; MUST match the folder name (`skills/<skill_id>/evals.json`).
- `evals_version` — semver of the evals.json shape. Independent of `canonical_version` and `schema_version`; bumps when the evals harness's contract changes (new field shapes, new expectation types). v1 baseline: `0.1`.
- `prompts[]` — array of test prompts. Each prompt is independently runnable; the benchmark mode iterates the array and grades each in isolation.
- `prompts[].id` — stable identifier per prompt (kebab-case, deterministic). Allows the benchmark-grader to track per-prompt pass/fail over time and tie regressions to specific scenarios.
- `prompts[].prompt` — the user-shaped prompt the runner sends to a fresh subagent loaded with the skill's canonical. Should reflect realistic agent invocations rather than synthetic edge probes.
- `prompts[].expectations[]` — LLM-judged assertions the response must satisfy. The benchmark-grader scores each one with pass/fail plus evidence quoted from the response.
- `prompts[].tool_call_expectations[]` — LLM-judged assertions about the runner's tool-call pattern (e.g., "should call read_skill('X') at least once", "should NOT redirect mid-flow"). Scored alongside `expectations[]`.

**Lifecycle.** The walker seeds evals.json on first walk completion with a minimal baseline prompt set; the operator extends it manually as new edge cases surface. evals.json is versioned with the canonical (same git history; same PR landings). It is NOT a generated artefact — it ships as authored.

**Out of scope for v1.** evals.json does not carry runtime metadata (latency budgets, tokens-in/tokens-out targets) — those are runner-side concerns recorded in the benchmark output, not in the eval definitions themselves. Per-prompt baseline-without-canonical comparison is the runner's mode (`baseline: true`); evals.json does not encode the comparison expectation directly.


## 5. Product-spec integration points

The build-tool artefacts described above interact with the product spec at the following points. When the product spec changes, the corresponding build-tool reference at `bc-skills/bc-corpus-creator/references/` must be updated to match.

| Build-tool artefact | Product-spec reference | Notes |
|---|---|---|
| `canonical.md` shape | `schemas.md §6.1` (skill schema) | The product spec is authoritative; bc-corpus-creator's `references/canonical-shape.md` is the build-time mirror. |
| Walking-procedure steps | `skills.md §15.1` | Manual walks per `skills.md §15.1` remain valid. bc-corpus-creator operationalises the same steps. |
| Source classes | `skills.md §15.2` | The closed enum + grade values are in the product spec. The `usage` enum on each source row (citation / corroboration / failure-mode-context / comparative) is internal to research-report and not surfaced in the product corpus. |
| Failure-mode inclusion rules | `skills.md §15.3` | research-report §11 carries the catalog; canonical body surfaces what it must. |
| Skill-PR classifier path filter | `lifecycle.md §10.1` | Covers `canonical|research-report|evals` artefacts uniformly. |

## 6. Process for build-tool spec updates

Changes to research-report.md or evals.json schemas follow the same amendment-proposal workflow as the product spec (`amendment-proposals/README.md`). A change to the build-tool artefact shape MAY require a corresponding update to the product spec when:

- The artefact's product-spec integration points (§5 above) shift
- The skill-PR classifier path filter needs to change
- A new artefact is introduced that ships in `bc-docs/skills/<id>/`

Build-tool-only changes (rubric refinement, voice-and-style updates, internal agent prompts) do not require product-spec amendments; they are committed directly to `bc-skills/bc-corpus-creator/references/` with operator review.