Skip to content

parse-inventory

This module enriches the raw font inventory produced by dump_fonts by performing script, language, and writing-system inference.

It operates purely on JSON data and never inspects font binaries.


Responsibilities

  • Infer primary script(s)
  • Infer language coverage
  • Normalize Unicode coverage information
  • Attach inference metadata to each font entry

Scope and non-responsibilities

The parse-inventory stage is responsible for validating and normalizing inventory data produced by earlier stages.

It is not responsible for:

  • discovering fonts on the system
  • inspecting font files directly
  • extracting raw font metadata
  • generating output artifacts
  • performing LaTeX compilation

All font discovery and metadata extraction are performed upstream by the dump stage.

All output generation is handled by the catalog generation stage.

This separation ensures that:

  • parsing remains deterministic
  • validation rules are centralized
  • pipeline stages remain loosely coupled

Inspection Mode

parse-inventory also provides a lightweight reporting mode for already-generated inventories:

fontshow parse-inventory --list-missing-language-coverage

This mode:

  • reads the selected inventory file,
  • lists fonts whose coverage.languages field is empty,
  • exits without writing an output file.

Output is deterministic and preserves inventory order.

Structured warnings

Fontshow uses structured warnings to report non-fatal issues detected during inventory parsing, validation, and inference.

Warnings are designed to be:

  • deterministic
  • machine-readable
  • attached directly to the relevant inventory node

No warning affects inference results.


Warning model

Warnings are represented as dictionaries with the following structure:

{
  "code": "missing_declared_languages",
  "message": "No declared languages available from FontConfig; inference.languages will be derived solely from Unicode data",
  "severity": "info"
}

Each warning has:

  • code A stable, machine-readable identifier suitable for filtering or tooling.

  • message A human-readable description intended for end users.

  • severity A qualitative severity level. Current values include:

  • "info"
  • "warning"
  • "error" (reserved for future use)

Warning attachment

Warnings are attached directly to the inventory node they refer to:

  • Inventory-level warnings Attached to the inventory root object (e.g. schema issues).

  • Font-level warnings Attached to individual font entries (e.g. missing declared metadata).

Example:

{
  "fonts": [
    {
      "path": "...",
      "warnings": [
        {
          "code": "missing_declared_languages",
          "message": "No declared languages available from FontConfig; inference.languages will be derived solely from Unicode data",
          "severity": "info"
        }
      ]
    }
  ]
}

Warning API

All warnings are created using a single canonical API:

add_structured_warning(
    target: dict,
    *,
    code: str,
    message: str,
    severity: str = "warning",
) -> None

Where:

  • target is either the inventory root dictionary or a single font entry
  • the target dictionary is modified in place

This API replaces earlier ad-hoc helpers and ensures a consistent and unambiguous warning model across the entire codebase.


Semantic validation of language codes

In addition to JSON Schema validation, Fontshow provides semantic checks to ensure that all declared and inferred language codes are valid ISO 639 identifiers.

This validation step detects issues that cannot be expressed in JSON Schema, such as invalid or mistyped language codes.

The following sources are checked:

  • coverage.languages (declared languages)
  • inference.languages (inferred languages)

Semantic validation is performed by the fontshow validate-inventory command and emits structured warnings without failing the validation process.


Language normalization and validation

This section documents how parse-inventory handles language data extracted from font metadata.

Language processing is intentionally split into two independent stages:

  • normalization
  • validation

These stages affect behavior but do not change the inventory schema.


Language normalization

Normalization is a best-effort transformation applied to language tags in order to improve consistency across heterogeneous font metadata.

Normalization MAY include:

  • canonical casing (e.g. en-usen-US)
  • replacement of deprecated subtags
  • removal of unsupported private extensions
  • mapping of legacy identifiers where possible

Normalization:

  • does not guarantee correctness
  • does not enforce standards
  • does not fail processing

Its purpose is to reduce noise while preserving information.


Validation modes

parse-inventory supports two validation modes.

Permissive mode (default)

  • Invalid or deprecated language tags are accepted
  • Warnings may be emitted
  • Processing continues
  • Normalized values may be produced

This mode prioritizes compatibility with real-world font metadata.


Strict mode (--strict-bcp47)

When enabled:

  • Only RFC-compliant BCP-47 language tags are accepted
  • Deprecated or malformed tags cause a hard failure
  • No silent normalization is applied
  • Inventory generation stops on first violation

Strict mode:

  • affects validation only
  • does not alter schema structure
  • does not change output layout

CLI Notes

Common parse-inventory usage patterns:

# Enrich the raw inventory
fontshow parse-inventory

# Validate an enriched inventory only
fontshow parse-inventory -I

# List fonts missing declared language coverage
fontshow parse-inventory --list-missing-language-coverage

# Enforce strict BCP-47 validation while enriching
fontshow parse-inventory --strict-bcp47

Design principles

  • Normalization ≠ validation
  • Validation ≠ enforcement
  • Enforcement is always explicit
  • Behavior is deterministic and observable

Non-goals

  • Automatic language inference
  • Linguistic correctness guarantees
  • Silent mutation of source metadata

Design notes

  • Warnings are informational only and never block processing.
  • Declared metadata is never modified based on warnings.
  • The warning system is intentionally minimal and extensible.
  • Wrapper functions for warning emission were removed in C4.3 to avoid ambiguity and duplicated semantics.

Semantic Validation

parse-font-inventory does not perform semantic validation.

At this stage:

  • language normalization is performed
  • inference may occur
  • warnings may be generated

Semantic validation is deferred to later pipeline stages (e.g. create-catalog), where strict validation rules may apply.


API reference

fontshow.cli.parse_inventory

Fontshow parse-inventory CLI command.

This module implements the inventory enrichment stage of the Fontshow pipeline. It reads a raw inventory produced by dump-fonts, performs deterministic metadata inference, and produces a normalized inventory ready for validation and catalog generation.

Responsibilities

  • Load and validate the structure of a raw Fontshow inventory.
  • Perform deterministic inference of scripts and languages.
  • Enrich inventory entries with derived metadata.
  • Serialize the normalized inventory for downstream processing.

Design principles

This stage operates exclusively on JSON inventory data and performs no direct inspection of font binaries. All inference logic must be deterministic so that identical inputs produce identical outputs.

Architectural role

This module belongs to the CLI interface layer and implements the inventory enrichment stage of the Fontshow processing pipeline.

build_parser

build_parser(parser: ArgumentParser) -> None

Register parse-inventory CLI arguments on an existing parser.

Parameters

parser : argparse.ArgumentParser Parser instance to configure for the parse-inventory command.

Returns

None

main

main(args) -> int

Public CLI entrypoint (kept stable).

Parameters

args : argparse.Namespace Parsed CLI arguments controlling parse-inventory execution.

Returns

int Process exit code returned by the CLI workflow.

Notes

Thin wrapper around the injectable runner. Unexpected TypeError exceptions are converted into exit code 2 after user-facing error reporting and performance tracing.

parse_inventory

parse_inventory(data: dict[str, Any], level: str, *, strict_bcp47: bool = False) -> dict[str, Any]

Parse and enrich a font inventory structure.

Parameters

data : dict[str, Any] Raw inventory structure to validate, enrich, and update in place. level : str Inference aggressiveness level forwarded to metadata processing. strict_bcp47 : bool, optional Whether language-tag normalization must reject non-compliant BCP-47 values.

Returns

dict[str, Any] Enriched inventory structure with updated metadata and per-font inferred fields.

Raises

ValueError Propagated when schema validation or downstream metadata helpers reject the input inventory.

Notes

Refactored version: - reduced complexity - separated concerns - behavior unchanged The function validates the input first, then processes charset, inference, language metadata, and specimen generation for each font before updating top-level inventory metadata.

register_cli

register_cli(parser) -> None

Register parse-inventory CLI arguments.

Parameters

parser : argparse.ArgumentParser Parser instance configured by the top-level dispatcher.

Returns

None

Notes

This function is used by the top-level fontshow dispatcher.

run_parse_font_inventory

run_parse_font_inventory(args, *, parse_inventory_fn=parse_inventory, validate_inventory_fn=validate_inventory, read_text_fn=None, write_text_fn=None) -> int

Run the internal parse-font-inventory CLI flow.

Parameters

args : argparse.Namespace Parsed CLI arguments controlling input, output, validation-only mode, and inference strictness. parse_inventory_fn : callable, optional Injectable inventory enrichment function used for testing. validate_inventory_fn : callable, optional Injectable validation function used in validate-only mode. read_text_fn : callable | None, optional Optional file-reading adapter. Defaults to _default_read_text. write_text_fn : callable | None, optional Optional file-writing adapter. Defaults to _default_write_text.

Returns

int Process exit code for the parse-inventory workflow.

Raises

json.JSONDecodeError May propagate indirectly from the injected read/parse path if malformed JSON is not intercepted by the caller. OSError May propagate from injected read or write adapters. ValueError May propagate from parse_inventory_fn if enrichment or strict validation rejects the loaded inventory.

Notes

Refactored version: - reduced complexity - helpers extracted - behavior unchanged The runner validates platform compatibility and schema integrity before either executing validate-only mode or producing an enriched inventory and writing it to disk.