parse-inventory¶
This module enriches the raw font inventory produced by dump_fonts
by performing script, language, and writing-system inference.
It operates purely on JSON data and never inspects font binaries.
Responsibilities¶
- Infer primary script(s)
- Infer language coverage
- Normalize Unicode coverage information
- Attach inference metadata to each font entry
Scope and non-responsibilities¶
The parse-inventory stage is responsible for validating and normalizing
inventory data produced by earlier stages.
It is not responsible for:
- discovering fonts on the system
- inspecting font files directly
- extracting raw font metadata
- generating output artifacts
- performing LaTeX compilation
All font discovery and metadata extraction are performed upstream by the dump stage.
All output generation is handled by the catalog generation stage.
This separation ensures that:
- parsing remains deterministic
- validation rules are centralized
- pipeline stages remain loosely coupled
Inspection Mode¶
parse-inventory also provides a lightweight reporting mode for
already-generated inventories:
fontshow parse-inventory --list-missing-language-coverage
This mode:
- reads the selected inventory file,
- lists fonts whose
coverage.languagesfield is empty, - exits without writing an output file.
Output is deterministic and preserves inventory order.
Structured warnings¶
Fontshow uses structured warnings to report non-fatal issues detected during inventory parsing, validation, and inference.
Warnings are designed to be:
- deterministic
- machine-readable
- attached directly to the relevant inventory node
No warning affects inference results.
Warning model¶
Warnings are represented as dictionaries with the following structure:
{
"code": "missing_declared_languages",
"message": "No declared languages available from FontConfig; inference.languages will be derived solely from Unicode data",
"severity": "info"
}
Each warning has:
-
code A stable, machine-readable identifier suitable for filtering or tooling.
-
message A human-readable description intended for end users.
-
severity A qualitative severity level. Current values include:
"info""warning""error"(reserved for future use)
Warning attachment¶
Warnings are attached directly to the inventory node they refer to:
-
Inventory-level warnings Attached to the inventory root object (e.g. schema issues).
-
Font-level warnings Attached to individual font entries (e.g. missing declared metadata).
Example:
{
"fonts": [
{
"path": "...",
"warnings": [
{
"code": "missing_declared_languages",
"message": "No declared languages available from FontConfig; inference.languages will be derived solely from Unicode data",
"severity": "info"
}
]
}
]
}
Warning API¶
All warnings are created using a single canonical API:
add_structured_warning(
target: dict,
*,
code: str,
message: str,
severity: str = "warning",
) -> None
Where:
targetis either the inventory root dictionary or a single font entry- the target dictionary is modified in place
This API replaces earlier ad-hoc helpers and ensures a consistent and unambiguous warning model across the entire codebase.
Semantic validation of language codes¶
In addition to JSON Schema validation, Fontshow provides semantic checks to ensure that all declared and inferred language codes are valid ISO 639 identifiers.
This validation step detects issues that cannot be expressed in JSON Schema, such as invalid or mistyped language codes.
The following sources are checked:
coverage.languages(declared languages)inference.languages(inferred languages)
Semantic validation is performed by the fontshow validate-inventory command and
emits structured warnings without failing the validation process.
Language normalization and validation¶
This section documents how parse-inventory handles language data extracted
from font metadata.
Language processing is intentionally split into two independent stages:
- normalization
- validation
These stages affect behavior but do not change the inventory schema.
Language normalization¶
Normalization is a best-effort transformation applied to language tags in order to improve consistency across heterogeneous font metadata.
Normalization MAY include:
- canonical casing (e.g.
en-us→en-US) - replacement of deprecated subtags
- removal of unsupported private extensions
- mapping of legacy identifiers where possible
Normalization:
- does not guarantee correctness
- does not enforce standards
- does not fail processing
Its purpose is to reduce noise while preserving information.
Validation modes¶
parse-inventory supports two validation modes.
Permissive mode (default)¶
- Invalid or deprecated language tags are accepted
- Warnings may be emitted
- Processing continues
- Normalized values may be produced
This mode prioritizes compatibility with real-world font metadata.
Strict mode (--strict-bcp47)¶
When enabled:
- Only RFC-compliant BCP-47 language tags are accepted
- Deprecated or malformed tags cause a hard failure
- No silent normalization is applied
- Inventory generation stops on first violation
Strict mode:
- affects validation only
- does not alter schema structure
- does not change output layout
CLI Notes¶
Common parse-inventory usage patterns:
# Enrich the raw inventory
fontshow parse-inventory
# Validate an enriched inventory only
fontshow parse-inventory -I
# List fonts missing declared language coverage
fontshow parse-inventory --list-missing-language-coverage
# Enforce strict BCP-47 validation while enriching
fontshow parse-inventory --strict-bcp47
Design principles¶
- Normalization ≠ validation
- Validation ≠ enforcement
- Enforcement is always explicit
- Behavior is deterministic and observable
Non-goals¶
- Automatic language inference
- Linguistic correctness guarantees
- Silent mutation of source metadata
Design notes¶
- Warnings are informational only and never block processing.
- Declared metadata is never modified based on warnings.
- The warning system is intentionally minimal and extensible.
- Wrapper functions for warning emission were removed in C4.3 to avoid ambiguity and duplicated semantics.
Semantic Validation¶
parse-font-inventory does not perform semantic validation.
At this stage:
- language normalization is performed
- inference may occur
- warnings may be generated
Semantic validation is deferred to later pipeline stages (e.g. create-catalog), where strict validation rules may apply.
API reference¶
fontshow.cli.parse_inventory ¶
Fontshow parse-inventory CLI command.
This module implements the inventory enrichment stage of the Fontshow
pipeline. It reads a raw inventory produced by dump-fonts, performs
deterministic metadata inference, and produces a normalized inventory
ready for validation and catalog generation.
Responsibilities¶
- Load and validate the structure of a raw Fontshow inventory.
- Perform deterministic inference of scripts and languages.
- Enrich inventory entries with derived metadata.
- Serialize the normalized inventory for downstream processing.
Design principles¶
This stage operates exclusively on JSON inventory data and performs no direct inspection of font binaries. All inference logic must be deterministic so that identical inputs produce identical outputs.
Architectural role¶
This module belongs to the CLI interface layer and implements the inventory enrichment stage of the Fontshow processing pipeline.
build_parser ¶
build_parser(parser: ArgumentParser) -> None
main ¶
main(args) -> int
Public CLI entrypoint (kept stable).
Parameters¶
args : argparse.Namespace Parsed CLI arguments controlling parse-inventory execution.
Returns¶
int Process exit code returned by the CLI workflow.
Notes¶
Thin wrapper around the injectable runner.
Unexpected TypeError exceptions are converted into exit code 2
after user-facing error reporting and performance tracing.
parse_inventory ¶
parse_inventory(data: dict[str, Any], level: str, *, strict_bcp47: bool = False) -> dict[str, Any]
Parse and enrich a font inventory structure.
Parameters¶
data : dict[str, Any] Raw inventory structure to validate, enrich, and update in place. level : str Inference aggressiveness level forwarded to metadata processing. strict_bcp47 : bool, optional Whether language-tag normalization must reject non-compliant BCP-47 values.
Returns¶
dict[str, Any] Enriched inventory structure with updated metadata and per-font inferred fields.
Raises¶
ValueError Propagated when schema validation or downstream metadata helpers reject the input inventory.
Notes¶
Refactored version: - reduced complexity - separated concerns - behavior unchanged The function validates the input first, then processes charset, inference, language metadata, and specimen generation for each font before updating top-level inventory metadata.
register_cli ¶
register_cli(parser) -> None
run_parse_font_inventory ¶
run_parse_font_inventory(args, *, parse_inventory_fn=parse_inventory, validate_inventory_fn=validate_inventory, read_text_fn=None, write_text_fn=None) -> int
Run the internal parse-font-inventory CLI flow.
Parameters¶
args : argparse.Namespace
Parsed CLI arguments controlling input, output, validation-only
mode, and inference strictness.
parse_inventory_fn : callable, optional
Injectable inventory enrichment function used for testing.
validate_inventory_fn : callable, optional
Injectable validation function used in validate-only mode.
read_text_fn : callable | None, optional
Optional file-reading adapter. Defaults to _default_read_text.
write_text_fn : callable | None, optional
Optional file-writing adapter. Defaults to _default_write_text.
Returns¶
int Process exit code for the parse-inventory workflow.
Raises¶
json.JSONDecodeError
May propagate indirectly from the injected read/parse path if
malformed JSON is not intercepted by the caller.
OSError
May propagate from injected read or write adapters.
ValueError
May propagate from parse_inventory_fn if enrichment or strict
validation rejects the loaded inventory.
Notes¶
Refactored version: - reduced complexity - helpers extracted - behavior unchanged The runner validates platform compatibility and schema integrity before either executing validate-only mode or producing an enriched inventory and writing it to disk.