Performance Benchmarking¶

Codira performance measurements are developer-facing tooling. They must not change normal CLI behavior or make timing values part of the test contract.

Tools¶

Required system tools:

hyperfine for command-level timing

Python development extras:

pyinstrument for optional profile reports
snakeviz for optional local inspection of .prof files

psutil and hyperfine are treated as system-level tools in this repository, not Python development dependencies.

Artifact Layout¶

Benchmark artifacts are written under:

.artifacts/benchmarks/<run-id>/

Saved JSON artifacts include:

UTC run timestamp
Codira version
Git commit ID
active analyzer/backend plugin inventory
manifest path when applicable
availability of hyperfine, pyinstrument, and snakeviz

Profile artifacts are written under:

.artifacts/benchmarks/<run-id>/profiles/

Codira index state for campaign commands is isolated from each target repository and written under:

.artifacts/benchmarks/<run-id>/indexes/<category-label>/

The campaign runner passes this directory through --output-dir for index and ctx commands.

Manifest Format¶

The campaign runner expects a JSON manifest:

{
  "repositories": [
    {
      "label": "codira",
      "category": "small",
      "path": "/path/to/codira",
      "query": "schema migration logic",
      "modes": ["cold", "warm", "partial_change"]
    },
    {
      "label": "fontshow",
      "category": "medium",
      "path": "/path/to/fontshow"
    },
    {
      "label": "texlive",
      "category": "large",
      "path": "/path/to/texlive"
    }
  ]
}

Each repository entry requires:

label
category
path

Optional fields:

query, defaulting to schema migration logic
modes, defaulting to cold, warm, and partial_change
commands, an optional list of Codira subcommand argv arrays benchmarked through Hyperfine in addition to the default index --full, warm index, and ctx --json commands

Missing repository paths fail fast before commands are run.

When commands is present, each entry is a JSON array of tokens excluding the Codira executable itself, for example:

["sym", "build_parser", "--json"]

Supported manifest-benchmark subcommands are:

help
index
cov
sym
symlist
emb
calls
refs
audit
ctx
plugins
caps

Manifest command tokens may use these placeholders:

{path} for the repository root path
{output_dir} for the isolated Codira state directory
{query} for the repository query string

For path-aware subcommands, the campaign runner appends --path and --output-dir automatically when they are not already present.

Adaptive Command Resolution¶

Before building the measured command matrix for one repository, the campaign runs a discovery pass:

build a temporary Codira index under /tmp
run codira symlist --json against that temporary index
score discovered function and method symbols by graph connectivity
resolve adaptive benchmark commands toward richer repo-specific targets
persist the selector provenance without persisting the temporary index

Adaptive resolution currently applies to these command families:

sym
symlist
emb
calls
refs
ctx

Resolution rules:

sym, calls, and refs try ranked symbol candidates and keep the highest scoring command that returns JSON status: "ok" with non-empty results
emb and ctx share one resolved semantic query chosen from the manifest query plus discovered symbol and module names
symlist is kept only when it returns a non-empty symbol inventory
commands that cannot be resolved to meaningful output are skipped for that repository instead of aborting the full campaign

The selector writes one JSON provenance artifact per repository under:

.artifacts/benchmarks/<run-id>/selection/

These artifacts record:

requested query and requested commands
resolved query and resolved commands
skipped commands
discovery candidate scores and command trial details

The main campaign-plan.json also records:

requested_query
query (resolved)
requested_commands
resolved_commands
skipped_commands
selection_artifact

Campaign Command¶

Inspect the planned commands without executing:

python scripts/benchmark_campaign.py benchmarks.json --dry-run

Run a campaign:

python scripts/benchmark_campaign.py benchmarks.json --runs 10 --warmup 2

Use a stable run identifier when comparing artifacts:

python scripts/benchmark_campaign.py benchmarks.json --run-id 20260430-baseline

For issue #30, record paired short-manifest runs for both first-party backends before broad campaigns:

CODIRA_INDEX_BACKEND=sqlite python scripts/benchmark_campaign.py \
  benchmarks/short_benchmark.local.json --run-id issue-30-short-sqlite
CODIRA_INDEX_BACKEND=duckdb python scripts/benchmark_campaign.py \
  benchmarks/short_benchmark.local.json --run-id issue-30-short-duckdb

Capture the resulting artifact paths in the branch execution ledger before running broader manifests.

Plugin Requirements¶

Benchmark metadata depends on each analyzer and backend exposing stable plugin identity through the normal Codira plugin registry.

First-party analyzer/backend plugins must provide:

stable plugin name
stable provider distribution name
implementation version
deterministic discovery globs for analyzers
deterministic loading status through codira plugins --json

The first-party plugin set included in benchmark metadata tests is:

codira-analyzer-python
codira-analyzer-json
codira-analyzer-c
codira-analyzer-bash
codira-backend-sqlite
codira-backend-duckdb

New first-party analyzer or backend packages must update the shared benchmark plugin provider list and its tests.

Validation Policy¶

Automated tests validate parser behavior, dry-run command construction, manifest loading, metadata shape, and first-party plugin coverage. They do not assert exact timing values.

Performance campaigns are manual developer workflows. Normal CI should validate the scripts without running noisy wall-clock benchmark gates.