Language Analyzers¶

The current repository now gets its default analyzers from first-party plugin packages:

Python through the extracted codira-analyzer-python first-party package
JSON through the extracted codira-analyzer-json first-party package for deterministic structured document families such as JSON Schema, package.json, and .releaserc.json
C for the first non-Python proof required by ADR-004, installed through the extracted codira-analyzer-c first-party package
Bash through the extracted codira-analyzer-bash first-party package

Current Analyzer Responsibilities¶

The Python analysis path currently performs:

module, class, and function extraction
import collection
static call-record extraction
callable-reference extraction
docstring validation integration

Today these responsibilities are concentrated in the analyzer packages and compatibility modules:

src/codira/parser_ast.py
src/codira/analyzers/python.py
src/codira/analyzers/json.py
src/codira/analyzers/c.py
packages/codira-analyzer-python/
packages/codira-analyzer-json/
packages/codira-analyzer-c/
packages/codira-analyzer-bash/
src/codira/indexer.py for analyzer routing only

Current Scope Boundary¶

scanner.iter_project_files() now derives discovery globs from the active analyzer set instead of relying on a hard-coded core tuple.

Each analyzer declares deterministic discovery_globs, and scanner discovery uses those globs for candidate discovery before confirming ownership through supports_path() for both:

Git-backed tracked-file discovery
filesystem fallback outside Git repositories

Phase 19 adds a second scanner path for canonical coverage auditing:

src/
tests/
scripts/

Tracked files under those directories are inspected for coverage even if no active analyzer claims them yet.

The retrieval-capability migration does not currently widen analyzer responsibilities.

Analyzer packages still own:

language-specific parsing
normalized artifact extraction
durable symbol identity for indexed artifacts

They do not yet need to implement RetrievalProducer. Retrieval-facing capability metadata currently lives in shared query producer descriptors instead.

Accepted Migration Direction¶

ADR-004 expands this boundary by accepting:

multiple analyzers in one indexing run
mixed-language repositories as a first-class target
a future proof analyzer beyond Python, with C named as the preferred first validation target

Phase-6 Baseline¶

Phase 6 now extracts the current Python analysis path into src/codira/analyzers/python.py.

That module owns:

Python parsing through parser_ast.parse_file()
normalization into AnalysisResult
Python file acceptance through the LanguageAnalyzer contract

Phase-8 Registration Rules¶

Phase 8 moves analyzer registration into src/codira/registry.py.

analyzers are instantiated from built-ins plus entry-point plugin discovery
registry order defines deterministic first-match routing order after scanner-side ownership filtering
an empty analyzer registry raises ValueError
extracted analyzers may be omitted when their plugin packages are not installed

Current JSON Family Boundary¶

The first-party JSON analyzer is intentionally family-based rather than generic.

Supported families:

JSON Schema documents
npm-style package.json manifests
semantic-release .releaserc.json configuration

Supported JSON symbols currently include:

schema definition names
schema property paths
package names
package script keys
package dependency names
semantic-release branch names
semantic-release plugin identifiers

Explicitly unsupported JSON inputs include:

lockfiles such as package-lock.json
VS Code workspace JSONC files under .vscode/
generic unclassified JSON blobs

This keeps JSON indexing deterministic and query-oriented without broadening support to arbitrary machine-generated artifacts.

Phase-9 Second Analyzer Proof¶

Phase 9 added the C analyzer proof and registered it after Python.

Python keeps the full AST-driven extraction path
C currently extracts module identity, include dependencies, and top-level function definitions
mixed-language repositories are now indexed in one deterministic run

The C analyzer is intentionally narrow. It exists to prove the abstraction and file-routing model before any deeper C-specific call analysis work.

Current C Parser Boundary¶

The current C analyzer is now backed by tree-sitter-c.

That gives the branch:

parse-tree-based function extraction
parse-tree-based include extraction
AST-based call extraction for direct and attribute calls
a safer foundation for future include-graph and symbol-parity work

The normalized artifact model and backend contracts remain unchanged. Only the language-specific C parsing strategy has been upgraded.

Dependency Boundary¶

The packaging surface now distinguishes core codira dependencies from analyzer-specific dependencies.

core install keeps the contracts, CLI, registry, and compatibility shims available
the Python analyzer loads when codira-analyzer-python is installed
the JSON analyzer loads when codira-analyzer-json is installed
the C analyzer loads when codira-analyzer-c is installed
the Bash analyzer loads when codira-analyzer-bash is installed
the supported package form for C-family indexing is codira-analyzer-c
third-party analyzers must declare their own discovery globs so indexing can see their files without core changes

When those plugin packages are absent, registry activation skips the matching analyzer deterministically and indexing a matching path fails with an explicit installation hint instead of an import-time crash.