Issue 009 Execution¶

Purpose¶

This branch-local document records the execution plan for issue #9 after the architectural ADR has been accepted and before the implementation phases land.

Branch¶

The implementation branch for this issue is:

issue/9-capability-signal-layer

Phase 0 Outcome¶

Phase 0 is complete when both of the following exist and agree with each other:

docs/adr/ADR-006-capability-driven-signal-layer.md
this execution document

Phase 0 intentionally does not change runtime scoring behavior.

Phase 1 Outcome¶

Phase 1 produced a concrete inventory note:

docs/process/issue-009-capability-signal-layer-inventory.md

That note records the current implicit retrieval boundary, including:

current retrieval producer shapes
current score-bearing evidence families
current planner and channel routing assumptions
current explain-mode diagnostics already available for migration

Phase 2 Outcome¶

Phase 2 introduces the first internal retrieval-facing contract beside LanguageAnalyzer.

The first version is intentionally minimal:

versioned producer identity through producer_name, producer_version, and capability_version
declared retrieval capabilities
deterministic partitioning of known versus unknown capabilities for future diagnostics

This phase still does not change runtime scoring behavior.

The current branch first introduced a query-layer compatibility bridge for channel-shaped retrieval paths. That compatibility layer has since been collapsed into direct query producer specs so the runtime no longer depends on synthetic RetrievalProducer objects.

The accepted architecture at this point is the "good Option 1" model:

analyzers continue to satisfy LanguageAnalyzer
retrieval-facing capability metadata lives in a shared query producer layer
the core must consume declared producer metadata rather than analyzer internals
analyzers may adopt RetrievalProducer later where that is genuinely useful, but this is not required for the main migration path

Phase 3 now has an initial schema scaffold as well:

src/codira/query/signals.py

That module defines immutable retrieval signals and deterministic sort keys, but does not yet change runtime aggregation.

Detailed Implementation Plan¶

Phase 1 — Inventory current evidence and scoring entry points¶

Goal: Build a source-grounded inventory of the current ranking inputs before adding new abstractions.

Primary files:

src/codira/query/context.py
src/codira/query/classifier.py
src/codira/query/exact.py
src/codira/semantic/search.py
src/codira/contracts.py
docs/architecture/query-pipeline.md

Tasks:

Enumerate every current evidence source used by context_for.
Separate orchestration concepts from score-bearing concepts.
Record current merge inputs, diversity inputs, and explain outputs.
Identify where current logic branches on channel names, evidence families, or language-specific features.
Identify current invariants already protected by tests.

Deliverables:

implementation notes embedded in code comments or docs as appropriate
an updated execution ledger if the discovered boundary differs from the ADR

Exit criteria:

every current score-bearing path is accounted for
no hidden scoring entry point remains unlisted

Phase 2 — Minimal capability model¶

Goal: Introduce a small internal capability representation without changing behavior.

Primary files:

src/codira/contracts.py
src/codira/query/classifier.py
src/codira/query/context.py
tests/test_contracts.py
tests/test_context_rendering.py

Tasks:

Define the minimal capability vocabulary needed for current retrieval.
Decide whether capabilities attach first to analyzers, channels, or a new retrieval-producer abstraction.
Keep the first contract internal and deterministic.
Ensure missing capabilities degrade to absence rather than failure.

Accepted direction:

keep the retrieval-facing contract beside LanguageAnalyzer
do not require built-in analyzers to implement RetrievalProducer yet
centralize shared producer metadata in query-side descriptors
forbid core scoring from depending on analyzer implementation details

Initial capability candidates:

symbol_lookup
semantic_text
embedding_similarity
task_specialization
graph_relations
issue_annotations
diagnostics_metadata

Exit criteria:

the core can inspect capabilities generically
no ranking behavior changes yet

Phase 3 — Typed signal model and normalization rules¶

Goal: Define the internal signal objects that become the future scoring substrate.

Primary files:

src/codira/query/ (new module likely required)
src/codira/contracts.py
tests/test_context_rendering.py
tests/test_retrieval_merge.py

Tasks:

Create a typed signal model with deterministic ordering fields.
Define normalized properties for contribution strength, distance, and attribution.
Reuse durable symbol identities already emitted by analyzers.
Keep final scores out of the signal contract.

Recommended signal kinds for the first pass:

exact symbol
token/symbol text
embedding similarity
relation
proximity
repeated evidence

Exit criteria:

signal schema is explicit and test-covered
normalization semantics are documented in code and consistent with the ADR

Phase 4 — Adapters from current evidence to signals¶

Goal: Represent existing retrieval evidence as signals without changing intended ranking behavior.

Primary files:

src/codira/query/context.py
src/codira/query/exact.py
src/codira/semantic/search.py
tests/test_retrieval_merge.py
tests/test_characterization_phase2.py

Tasks:

Wrap exact symbol candidates as signals.
Wrap embedding candidates as signals.
Wrap current semantic/textual candidates as signals.
Preserve existing ranking characteristics while both representations may temporarily coexist.

Exit criteria:

current evidence can be represented entirely as signal objects
old and new representations can be compared during migration

Phase 5 — Capability-gated signal collection¶

Goal: Collect signals through generic capability checks rather than implicit feature-specific assumptions.

Primary files:

src/codira/query/context.py
src/codira/query/classifier.py
tests/test_context_rendering.py
tests/test_retrieval_merge.py

Tasks:

Add a deterministic signal collection step.
Use capability declarations to decide which signal families can be built.
Preserve current planner ordering while changing the internal substrate.
Keep unsupported capability families silent and deterministic.

Exit criteria:

the core assembles a per-query signal set generically
the current planner remains understandable and explainable

Phase 6 — Core signal aggregation for current ranking behavior¶

Goal: Move scoring from current channel/evidence-specific merge helpers onto signal aggregation while preserving behavior as closely as possible.

Primary files:

src/codira/query/context.py
tests/test_retrieval_merge.py
tests/test_characterization_phase2.py
tests/test_context_rendering.py

Tasks:

Define central aggregation rules over signals.
Preserve exact symbol dominance explicitly.
Preserve deterministic ordering and stable tie-breaking.
Keep current diversity and role-bias semantics compatible.

Exit criteria:

merged ranking reads normalized signals rather than ad hoc evidence bundles
ranking-sensitive tests remain green

Phase 7 — Call and proximity integration through signals¶

Goal: Migrate graph-derived relevance into the signal layer instead of leaving it as special-case query logic.

Primary files:

src/codira/query/context.py
src/codira/query/exact.py
tests/test_call_graph.py
tests/test_retrieval_merge.py

Tasks:

Represent call-graph and include-graph evidence as signals where appropriate.
Normalize direct relation versus proximity semantics.
Bound graph-derived contributions centrally.
Avoid language-name branching in the final aggregation step.

Exit criteria:

call/proximity relevance participates through signals
language-specific extraction remains outside final scoring policy

Phase 8 — Explain and JSON alignment¶

Goal: Make the new architecture visible in explain mode and structured output.

Primary files:

src/codira/query/context.py
src/codira/schema/context.schema.json
tests/test_json_schema.py
tests/test_context_rendering.py
tests/test_characterization_phase2.py

Tasks:

Replace channel-only merge diagnostics with signal-aware explain output.
Preserve explain stability and readability.
Decide whether JSON output can evolve in place or requires versioned additions.
Keep current planner metadata while adding signal contribution tracing.

Exit criteria:

explain mode can attribute ranking to signal families and capabilities
JSON schema and tests are updated intentionally

Phase 9 — Analyzer and channel contract follow-up¶

Goal: Adopt the new capability contract at the producer boundary in a disciplined way.

Primary files:

src/codira/contracts.py
src/codira/analyzers/python.py
src/codira/analyzers/c.py
src/codira/analyzers/bash.py
examples/plugins/codira_demo_analyzer/...
tests/test_contracts.py
tests/test_plugins.py

Tasks:

Extend the producer-side contract with explicit capability declaration.
Update built-in analyzers incrementally.
Update the demo analyzer plugin example.
Avoid forcing third-party API stability prematurely; keep the new surface documented as internal until it has proven out.

Exit criteria:

at least one clean end-to-end producer path uses the new contract
built-ins and plugin tests agree on the new expectations

Phase 10 — Validation matrix and migration hardening¶

Goal: Prove the architecture preserves current guarantees and is safe for follow-on analyzer work such as JSON and Make.

Validation categories:

deterministic ordering
exact-match dominance
capability-gated behavior
cross-language consistency
explain stability
regression compatibility for current retrieval behavior
bounded aggregation complexity

Primary files:

tests/test_retrieval_merge.py
tests/test_context_rendering.py
tests/test_characterization_phase2.py
tests/test_call_graph.py
tests/test_json_schema.py

Exit criteria:

the ADR invariants are covered by tests
JSON and Make analyzer work can target the new boundary instead of the old merge internals

Suggested PR Sequence¶

ADR + execution plan
evidence/scoring inventory + minimal capability scaffolding
signal model
signal adapters for current evidence
signal aggregation for existing ranking behavior
graph/proximity migration
explain and schema alignment
producer-side contract adoption and hardening