ADR-001 — Rank-based Multi-Channel Retrieval (RRF)¶

Date: 24/03/2026 Status: Accepted

Context¶

The initial retrieval pipeline used a score-based merge across channels (e.g., symbol, semantic). Each channel produced scores that were directly compared during merging.

This approach led to instability and conceptual issues:

Scores from different channels were not comparable (different scales, distributions, semantics).
Attempts to normalize or weight scores resulted in fragile, hard-to-tune behavior.
The semantic channel was implemented as a transformation of symbol results, not as an independent retrieval strategy.
As a result, semantic contributed little to recall and could not influence ranking meaningfully.

At the same time, the system evolved toward a multi-channel architecture, where different retrieval strategies should contribute independently.

Decision¶

Adopt a rank-based merge strategy using Reciprocal Rank Fusion (RRF) and redefine channels as independent retrieval mechanisms.

Specifically:

Replace score-based merging with RRF:
Final ranking is computed as: RRF score = Σ (1 / (rank + 1)) across channels.
Raw scores are no longer used for cross-channel comparison.
Redefine channels:
Each channel is responsible for retrieving and ranking its own candidates.
Channels must be independent (no longer layered or derived from each other).
Refactor semantic channel:
Remove dependency on _retrieve_symbol_candidates.
Query the database (symbol_index) directly.
Use heuristic scoring based on name, module, and (if available) docstring.
Preserve provenance:
Keep per-channel scores for explain/debug purposes.
Remove the “winner” concept, which is no longer meaningful under RRF.

Rationale¶

This decision resolves the fundamental mismatch between:

Channel-local scoring (confidence within a channel)
Global merging (ordering across channels)

Score-based fusion attempted to use a single value for both purposes, which is not valid when channels use different scoring models.

Rank-based fusion:

Eliminates the need for score normalization
Is robust to scale differences
Naturally rewards agreement across channels
Is deterministic and simple

Making channels independent ensures that:

Each channel can introduce new candidates
The system gains recall, not just re-ranking
Future channels (e.g., structural, embeddings) can be added without redesign

Consequences¶

Positive¶

Stable and predictable ranking behavior
Improved recall through semantic channel diversification
Clean architectural separation between channels
Easier extension with new retrieval strategies
No need for fragile score calibration

Negative¶

Raw scores lose meaning as global ranking signals
Explain output no longer has a simple “winner” concept
Semantic channel currently performs a full scan (may need optimization later)

Neutral / Trade-offs¶

Provenance still exposes scores, but only for inspection
Ranking logic is now slightly less intuitive without understanding RRF

Notes¶

Future improvements:
Optimize semantic retrieval (indexing, filtering)
Incorporate docstring indexing more efficiently
Introduce additional channels (e.g., structural, embedding-based)
Possibly expose RRF contributions in explain output
This ADR marks the transition from:
single-pipeline scoring system to
true multi-channel retrieval architecture