Skip to content

ADR-002 — Prefix-Scoped Query Filtering

Date: 27/03/2026 Status: Accepted

Context

codira query subcommands originally operated on the whole indexed repository. That worked for smaller repositories, but it made targeted work on one subsystem noisy:

  • exact symbol lookup returned duplicates from unrelated domains
  • embedding and context retrieval mixed matches across unrelated areas
  • call and callable-reference inspection could not be constrained to one owner domain
  • docstring audits could not be limited to a single subtree

At the storage level, several tables duplicated full file paths instead of reusing the central files table:

  • symbol_index
  • docstring_issues
  • call_records
  • callable_ref_records

The derived relation tables also lacked path ownership metadata:

  • call_edges
  • callable_refs

That made uniform prefix filtering awkward and inconsistent.

Decision

Adopt a permanent --prefix <repo-root-relative-path> option for supported read/query subcommands and make prefix filtering schema-backed.

Supported subcommands and semantics:

  • symbol --prefix P NAME: restrict to symbols whose defining file is under P
  • embeddings --prefix P QUERY: restrict to matched symbols whose file is under P
  • ctx --prefix P QUERY: restrict retrieval, expansion, docstring issues, and references to files under P
  • calls --prefix P NAME: restrict to call edges whose caller file is under P
  • refs --prefix P NAME: restrict to callable-object references whose owner file is under P
  • audit --prefix P: restrict to issues for symbols defined under P

Implementation details:

  • normalize --prefix once relative to the repository root
  • reject user-supplied absolute prefixes at the CLI boundary
  • use the central files.id as the owner key across path-sensitive tables
  • join through files.path when evaluating prefix filters

Schema changes:

  • replace repeated file_path columns with file_id in:
  • symbol_index
  • call_records
  • callable_ref_records
  • add file_id to docstring_issues
  • add owner-side file identifiers to derived relation tables:
  • caller_file_id on call_edges
  • owner_file_id on callable_refs

Rationale

This design keeps the filtering rule uniform while minimizing duplicated path storage.

Using a centralized file table has two advantages:

  • smaller tables and indexes, especially for raw call/reference record tables
  • one consistent join path for owner-file filtering

Filtering semantics for relation queries are intentionally owner-side:

  • calls is about caller-owned edges
  • refs is about owner-owned callable references

This avoids ambiguous "either side" semantics and keeps the feature teachable.

For ctx, prefix filtering is applied throughout the pipeline rather than only at the end, which reduces noise and wasted work.

Consequences

Positive

  • uniform scoping model across supported query subcommands
  • better signal when working in one subtree or file
  • smaller path-sensitive tables due to file_id reuse
  • exact, schema-backed filtering for audit
  • owner-side relation filtering that is deterministic and easy to reason about

Negative

  • schema version bump and rebuild cost for existing indexes
  • additional joins through files in some query paths
  • more explicit query-surface complexity in CLI and helper signatures

Neutral / Trade-offs

  • --prefix is relative by contract, but internal helper flows may pass the normalized absolute path after validation
  • relation filtering is not symmetric; it is intentionally tied to owner/caller semantics

Notes

  • Existing indexes are rebuilt through the normal schema-version refresh path.
  • Directory and single-file prefixes are both supported.