parse-inventory¶

This module enriches the raw font inventory produced by dump_fonts by performing script, language, and writing-system inference.

It operates purely on JSON data and never inspects font binaries.

Responsibilities¶

Infer primary script(s)
Infer language coverage
Normalize Unicode coverage information
Attach inference metadata to each font entry

Scope and non-responsibilities¶

The parse-inventory stage is responsible for validating and normalizing inventory data produced by earlier stages.

It is not responsible for:

discovering fonts on the system
inspecting font files directly
extracting raw font metadata
generating output artifacts
performing LaTeX compilation

All font discovery and metadata extraction are performed upstream by the dump stage.

All output generation is handled by the catalog generation stage.

This separation ensures that:

parsing remains deterministic
validation rules are centralized
pipeline stages remain loosely coupled

Inspection Mode¶

parse-inventory also provides a lightweight reporting mode for already-generated inventories:

fontshow parse-inventory --list-missing-language-coverage

This mode:

reads the selected inventory file,
lists fonts whose coverage.languages field is empty,
exits without writing an output file.

Output is deterministic and preserves inventory order.

Structured warnings¶

Fontshow uses structured warnings to report non-fatal issues detected during inventory parsing, validation, and inference.

Warnings are designed to be:

deterministic
machine-readable
attached directly to the relevant inventory node

No warning affects inference results.

Warning model¶

Warnings are represented as dictionaries with the following structure:

{
  "code": "missing_declared_languages",
  "message": "No declared languages available from FontConfig; inference.languages will be derived solely from Unicode data",
  "severity": "info"
}

Each warning has:

code A stable, machine-readable identifier suitable for filtering or tooling.
message A human-readable description intended for end users.
severity A qualitative severity level. Current values include:
"info"
"warning"
"error" (reserved for future use)

Warning attachment¶

Warnings are attached directly to the inventory node they refer to:

Inventory-level warnings Attached to the inventory root object (e.g. schema issues).
Font-level warnings Attached to individual font entries (e.g. missing declared metadata).

Example:

{
  "fonts": [
    {
      "path": "...",
      "warnings": [
        {
          "code": "missing_declared_languages",
          "message": "No declared languages available from FontConfig; inference.languages will be derived solely from Unicode data",
          "severity": "info"
        }
      ]
    }
  ]
}

Warning API¶

All warnings are created using a single canonical API:

add_structured_warning(
    target: dict,
    *,
    code: str,
    message: str,
    severity: str = "warning",
) -> None

Where:

target is either the inventory root dictionary or a single font entry
the target dictionary is modified in place

This API replaces earlier ad-hoc helpers and ensures a consistent and unambiguous warning model across the entire codebase.

Semantic validation of language codes¶

In addition to JSON Schema validation, Fontshow provides semantic checks to ensure that all declared and inferred language codes are valid ISO 639 identifiers.

This validation step detects issues that cannot be expressed in JSON Schema, such as invalid or mistyped language codes.

The following sources are checked:

coverage.languages (declared languages)
inference.languages (inferred languages)

Semantic validation is performed by the fontshow validate-inventory command and emits structured warnings without failing the validation process.

Language normalization and validation¶

This section documents how parse-inventory handles language data extracted from font metadata.

Language processing is intentionally split into two independent stages:

normalization
validation

These stages affect behavior but do not change the inventory schema.

Language normalization¶

Normalization is a best-effort transformation applied to language tags in order to improve consistency across heterogeneous font metadata.

Normalization MAY include:

canonical casing (e.g. en-us → en-US)
replacement of deprecated subtags
removal of unsupported private extensions
mapping of legacy identifiers where possible

Normalization:

does not guarantee correctness
does not enforce standards
does not fail processing

Its purpose is to reduce noise while preserving information.

Validation modes¶

parse-inventory supports two validation modes.

Permissive mode (default)¶

Invalid or deprecated language tags are accepted
Warnings may be emitted
Processing continues
Normalized values may be produced

This mode prioritizes compatibility with real-world font metadata.

Strict mode (`--strict-bcp47`)¶

When enabled:

Only RFC-compliant BCP-47 language tags are accepted
Deprecated or malformed tags cause a hard failure
No silent normalization is applied
Inventory generation stops on first violation

Strict mode:

affects validation only
does not alter schema structure
does not change output layout

CLI Notes¶

Common parse-inventory usage patterns:

# Enrich the raw inventory
fontshow parse-inventory

# Validate an enriched inventory only
fontshow parse-inventory -I

# List fonts missing declared language coverage
fontshow parse-inventory --list-missing-language-coverage

# Enforce strict BCP-47 validation while enriching
fontshow parse-inventory --strict-bcp47

Design principles¶

Normalization ≠ validation
Validation ≠ enforcement
Enforcement is always explicit
Behavior is deterministic and observable

Non-goals¶

Automatic language inference
Linguistic correctness guarantees
Silent mutation of source metadata

Design notes¶

Warnings are informational only and never block processing.
Declared metadata is never modified based on warnings.
The warning system is intentionally minimal and extensible.
Wrapper functions for warning emission were removed in C4.3 to avoid ambiguity and duplicated semantics.

Semantic Validation¶

parse-font-inventory does not perform semantic validation.

At this stage:

language normalization is performed
inference may occur
warnings may be generated

Semantic validation is deferred to later pipeline stages (e.g. create-catalog), where strict validation rules may apply.

API reference¶

fontshow.cli.parse_inventory ¶

Fontshow parse-inventory CLI command.

This module implements the inventory enrichment stage of the Fontshow pipeline. It reads a raw inventory produced by dump-fonts, performs deterministic metadata inference, and produces a normalized inventory ready for validation and catalog generation.

Responsibilities¶

Load and validate the structure of a raw Fontshow inventory.
Perform deterministic inference of scripts and languages.
Enrich inventory entries with derived metadata.
Serialize the normalized inventory for downstream processing.

Design principles¶

This stage operates exclusively on JSON inventory data and performs no direct inspection of font binaries. All inference logic must be deterministic so that identical inputs produce identical outputs.

Architectural role¶

This module belongs to the CLI interface layer and implements the inventory enrichment stage of the Fontshow processing pipeline.

build_parser ¶

build_parser(parser: ArgumentParser) -> None

Register parse-inventory CLI arguments on an existing parser.

Parameters¶

parser : argparse.ArgumentParser Parser instance to configure for the parse-inventory command.

Returns¶

None

main ¶

main(args) -> int

Public CLI entrypoint (kept stable).

Parameters¶

args : argparse.Namespace Parsed CLI arguments controlling parse-inventory execution.

Returns¶

int Process exit code returned by the CLI workflow.

Notes¶

Thin wrapper around the injectable runner. Unexpected TypeError exceptions are converted into exit code 2 after user-facing error reporting and performance tracing.

parse_inventory ¶

parse_inventory(data: dict[str, Any], level: str, *, strict_bcp47: bool = False) -> dict[str, Any]

Parse and enrich a font inventory structure.

Parameters¶

data : dict[str, Any] Raw inventory structure to validate, enrich, and update in place. level : str Inference aggressiveness level forwarded to metadata processing. strict_bcp47 : bool, optional Whether language-tag normalization must reject non-compliant BCP-47 values.

Returns¶

dict[str, Any] Enriched inventory structure with updated metadata and per-font inferred fields.

Raises¶

ValueError Propagated when schema validation or downstream metadata helpers reject the input inventory.

Notes¶

Refactored version: - reduced complexity - separated concerns - behavior unchanged The function validates the input first, then processes charset, inference, language metadata, and specimen generation for each font before updating top-level inventory metadata.

register_cli ¶

register_cli(parser) -> None

Register parse-inventory CLI arguments.

Parameters¶

parser : argparse.ArgumentParser Parser instance configured by the top-level dispatcher.

Returns¶

None

Notes¶

This function is used by the top-level fontshow dispatcher.

run_parse_font_inventory ¶

run_parse_font_inventory(args, *, parse_inventory_fn=parse_inventory, validate_inventory_fn=validate_inventory, read_text_fn=None, write_text_fn=None) -> int

Run the internal parse-font-inventory CLI flow.

Parameters¶

args : argparse.Namespace Parsed CLI arguments controlling input, output, validation-only mode, and inference strictness. parse_inventory_fn : callable, optional Injectable inventory enrichment function used for testing. validate_inventory_fn : callable, optional Injectable validation function used in validate-only mode. read_text_fn : callable | None, optional Optional file-reading adapter. Defaults to _default_read_text. write_text_fn : callable | None, optional Optional file-writing adapter. Defaults to _default_write_text.

Returns¶

int Process exit code for the parse-inventory workflow.

Raises¶

json.JSONDecodeError May propagate indirectly from the injected read/parse path if malformed JSON is not intercepted by the caller. OSError May propagate from injected read or write adapters. ValueError May propagate from parse_inventory_fn if enrichment or strict validation rejects the loaded inventory.

Notes¶

Refactored version: - reduced complexity - helpers extracted - behavior unchanged The runner validates platform compatibility and schema integrity before either executing validate-only mode or producing an enriched inventory and writing it to disk.

parse-inventory¶

Responsibilities¶

Scope and non-responsibilities¶

Inspection Mode¶

Structured warnings¶

Warning model¶

Warning attachment¶

Warning API¶

Semantic validation of language codes¶

Language normalization and validation¶

Language normalization¶

Validation modes¶

Permissive mode (default)¶

Strict mode (--strict-bcp47)¶

CLI Notes¶

Design principles¶

Non-goals¶

Design notes¶

Semantic Validation¶

API reference¶

fontshow.cli.parse_inventory ¶

Responsibilities¶

Design principles¶

Architectural role¶

build_parser ¶

Parameters¶

Returns¶

main ¶

Parameters¶

Returns¶

Notes¶

parse_inventory ¶

Parameters¶

Returns¶

Raises¶

Notes¶

register_cli ¶

Parameters¶

Returns¶

Notes¶

run_parse_font_inventory ¶

Parameters¶

Returns¶

Raises¶

Notes¶

Strict mode (`--strict-bcp47`)¶