Storage Backends¶
The current repository has two concrete first-party backends:
- SQLite
- DuckDB
Current Backend Responsibilities¶
Backends currently own:
- schema creation and refresh for their concrete storage file
- indexing-side persistence orchestration behind the
IndexBackendcontract - exact-query execution exposed through backend methods
- embedding inventory reads and candidate retrieval exposed through backend methods
- repository-local storage paths under
.codira/ - persisted runtime plugin inventory and per-file analyzer ownership metadata
Current package-local ownership notes:
packages/codira-backend-sqlite/.../sqlite_support.pyowns the SQLite helper implementationpackages/codira-backend-sqlite/.../sqlite_storage.pyowns the SQLite package-local bootstrap and path entrypoints used by the production backendpackages/codira-backend-duckdb/.../duckdb_support.pyowns the DuckDB persistence helper implementationpackages/codira-backend-duckdb/.../repo_storage.pyowns the DuckDB-local seam for generic.codiradirectory and metadata path accesspackages/codira-backend-duckdb/.../duckdb_query_backend.pyowns the DuckDB-local query and maintenance implementation used by the production backend
Index Session Contract¶
Issue #30 introduces an explicit split between the backend read path and the
backend write path.
Read-side responsibilities remain on IndexBackend:
- loading runtime inventory and analyzer inventory
- loading file hashes and per-file analyzer ownership
- checking embedding backend compatibility
- counting reusable embeddings for unchanged files
- processing pending embeddings for
codira index --embeddings-only - serving normal query commands
- reporting whether warm-index maintenance still needs mutation work
Write-side responsibilities now belong to begin_index_session(root) and the
returned IndexWriteSession:
- purge skipped docstring issues
- prune orphaned embeddings
- load reusable embeddings for paths being replaced
- prepare full or incremental storage replacement
- persist analyzed file snapshots
- queue deferred embedding rows when embedding indexing is deferred
- rebuild derived indexes
- write runtime inventory
- commit, abort, and close
This contract exists to make warm read-heavy command paths cheap while keeping mutation ownership explicit and backend-local.
Current Constraints¶
The accepted backend model is still constrained:
- one active backend per repository instance
- backend-neutral orchestration above the concrete storage implementation
- deterministic query and indexing semantics preserved across backends
- no multi-backend live switching inside one repository state directory
DuckDB is intentionally file-local, not a shared remote service backend. Its role is to provide a second production-grade backend with stronger local analytical behavior for larger indexes, including future documentation-heavy channels.
The current DuckDB backend is no longer coupled to SQLite runtime types or the SQLite backend package. Its query and maintenance implementation is fully owned inside the DuckDB package boundary.
Phase-8 Selection Rules¶
Phase 8 made backend activation explicit through src/codira/registry.py.
Issue #17 moves the persistent selection source into Codira configuration while
preserving the environment variable as a process override.
- the configured backend is read from effective configuration key
backend.name CODIRA_INDEX_BACKENDoverrides config files for the current process- the default backend is
sqlite - unsupported names fail fast with
ValueError - all current indexing and query entry points resolve the backend through the registry instead of constructing it ad hoc
Accepted Migration Direction¶
The accepted target is not “many active backends at once”. The accepted target is:
- one active backend per repository instance
- backend-neutral contracts above the concrete storage implementation
- preserved deterministic query and indexing semantics
Current first-party backend roles:
sqlite- default backend
- smallest operational surface
- file-local repository storage under
.codira/index.db duckdb- optional backend selected through
CODIRA_INDEX_BACKEND=duckdb - file-local repository storage under
.codira/index.duckdb - better fit for larger local analytical or document-heavy indexes
Phase 20 extends backend persisted state with:
- analyzer ownership columns on
filesrows - one-row runtime inventory for backend name, backend version, and coverage completeness
- analyzer inventory rows carrying analyzer versions and discovery-glob snapshots
Phase 21 makes SQLite use that metadata for deterministic rebuild policy:
- per-file analyzer ownership participates in incremental reuse decisions
- backend runtime inventory mismatches trigger automatic rebuilds
- analyzer inventory mismatches trigger automatic rebuilds
DuckDB follows the same rebuild policy through the same runtime and analyzer inventory contract.
Warm Read Path¶
Unchanged repositories should not enter writer setup just to confirm the index is still current.
The accepted warm-path sequence is:
- inspect runtime inventory through the backend read path
- inspect analyzer inventory through the backend read path
- load indexed file hashes and analyzer ownership
- compare against the current repository scan
- skip
begin_index_session(...)only when no file mutations are required and the backend reports no maintenance work
Maintenance work still forces a write session even when file contents are unchanged. Current examples:
- stale shell-owned docstring issues from older audit rules
- orphaned embedding rows left behind by previous storage versions
Current Boundary Status¶
The branch-local backend-agnostic refactor has established these boundaries:
- core query modules now type backend connections through backend-neutral
query protocols rather than
sqlite3.Connection - SQLite helper ownership has moved behind the SQLite backend package boundary
- SQLite bootstrap and database-path entrypoints are now package-local backend
seams rather than direct backend imports of
codira.storage - DuckDB persistence no longer routes through the SQLite helper module
- DuckDB no longer imports
codira_backend_sqliteat runtime - benchmark and SQLite-oriented test scaffolding now route setup through the SQLite backend package seam rather than calling core SQLite bootstrap
The DuckDB package-local query and maintenance implementation is now the supported production surface rather than a migration-only compatibility layer.
Issue #30 also moves DuckDB schema repair for legacy nullable edge tables out
of ordinary read-only opens. Repair now happens when a write session starts,
so ctx, sym, calls, symlist, audit, and other query commands do not
pay repair cost during warm reads.
Contributor Contract Validation Backend¶
Issue #9 adds a minimal in-memory backend for contributor-facing contract
validation. It lives in tests/memory_backend.py and is covered by
tests/test_memory_backend.py.
Use the in-memory backend when changing code that may affect the
IndexBackend contract or observable backend behavior, including:
src/codira/contracts.pysrc/codira/indexer.py- backend registry selection in
src/codira/registry.py - query-facing backend methods such as symbol lookup, docstring issues, call edges, callable references, include edges, and embedding inventory
- indexing lifecycle behavior such as full rebuilds, incremental reuse, deletion, runtime inventory, and analyzer inventory
The backend is intentionally not a production backend:
- it is not distributed as a
codira-backend-memorypackage - it is not available from normal installs
- it is not selected by running
CODIRA_INDEX_BACKEND=memory codira index - it does not persist data outside the Python process
Tests select it by installing a fake codira.backends entry point or by
patching the active backend in the real indexing path. This keeps the registry
and indexer contract exercised without presenting memory as a supported
operator-facing backend.
When extending the backend contract, update both the SQLite backend and the in-memory backend. Contract tests should compare observable behavior between SQLite and memory rather than SQLite internals, so regressions expose hidden coupling to SQL tables, row ids, or SQLite-specific query behavior.