Fontshow Pipeline¶

Overview¶

This document describes the Fontshow processing pipeline, from font discovery to catalog creation.

The pipeline is designed as a sequence of distinct stages, each with a distinct responsibility. Understanding stage boundaries is essential for debugging, validation, and environment-related issue analysis.

The goal of the pipeline is to:

collect information about fonts installed on the system;
normalize and validate this information;
produce a final, usable catalog (currently in LaTeX format).

The guiding principle is separation of concerns: each stage can be executed, verified, and debugged independently.

Execution Environment¶

The pipeline described in this document assumes execution within a well-defined environment.

Supported, partial, and experimental environments are documented separately in environment-matrix.md.

Environmental mismatches are a common source of pipeline failures and should be evaluated before investigating application-level issues.

General Flow¶

The logical pipeline can be summarized as:

Preflight checks
  ↓
System font dump
  ↓
Inventory parsing, validation and enrichment
  ↓
Catalog creation (.tex generation)
  ↓
LuaLaTeX compilation (multi-pass)

Each stage produces one or more intermediate artifacts, which can be retained for later analysis.

Pipeline Stages and Artifact Locality¶

Fontshow’s pipeline is composed of distinct stages with different assumptions regarding execution environment and data locality.

Understanding these boundaries is essential for correct usage and for interpreting validation results.

Stage 1 — Preflight¶

Preflight validates that the current execution environment is suitable for running subsequent stages.

This includes:

availability of required tools
expected runtime capabilities
environment consistency

Preflight results are not persisted and are valid only for the system on which they are executed.

Stage 2 — dump-fonts¶

This stage:

inspects the local filesystem
queries the system font configuration
discovers installed fonts
produces a serialized inventory

This stage is environment-dependent and must be run on the system whose fonts are being analyzed.

The resulting inventory is a data artifact, not a live reference.

Stage 3 — parse-inventory¶

This stage operates exclusively on serialized data.

Important properties:

No font files are accessed
No filesystem paths are resolved
No environment assumptions are made
All paths are treated as opaque data

This means:

✔ The inventory JSON can be moved across machines ✔ The stage is safe to run on a different system ✔ No font files are required at this stage

This behavior is intentional and enforced by design.

Stage 4 — create-catalog¶

This stage consumes the parsed inventory and produces catalog output.

While it does not re-scan fonts, it assumes that path references contained in the inventory are still meaningful for the current environment.

As a result:

Catalog generation may succeed across systems
But path-based features depend on compatibility of environments
No path normalization or remapping is performed

Stage 5 — LaTeX Compilation¶

LaTeX compilation requires:

actual access to font files
correct font resolution by the TeX engine
filesystem paths matching those recorded earlier

For this reason, LaTeX compilation must be performed on the same system (or an equivalent environment) where fonts are available.

This is an intentional design constraint.

Summary: Environment Assumptions¶

Stage	Requires Same Machine	Uses Filesystem	Portable
Preflight	✔	✔	❌
dump-fonts	✔	✔	❌
parse-inventory	❌	❌	✔
create-catalog	⚠️	⚠️	Partial
LaTeX compile	✔	✔	❌

This separation allows:

reproducible inspection
artifact-based workflows
controlled cross-system validation
predictable failure modes

Stage 0 — Preflight Checks¶

The preflight stage validates that the execution environment satisfies the minimum requirements required to run the Fontshow pipeline safely.

Its purpose is to detect environment-level issues early and to prevent execution when required capabilities are missing or incompatible.

Preflight is responsible only for environment validation and does not perform any form of data processing or font analysis.

Documentation of stage 0¶

All details regarding preflight behavior, including:

scope and responsibilities
supported environments
performed checks
severity levels
CLI and CI behavior

are documented in:

→ docs/tools/preflight.md

Role in the pipeline of stage 0¶

The preflight stage acts as a gatekeeper for the pipeline:

it runs before any other stage
it may abort execution if requirements are not met
it does not modify user data
it does not perform font parsing or analysis

All subsequent stages assume a successful preflight execution.

Stage 1 — System font dump¶

This stage is responsible for discovering fonts available on the system and extracting raw font metadata required by downstream stages.

This stage focuses on data collection, not interpretation.

Documentation of stage 1¶

All implementation details related to the system font dump, including:

discovery mechanisms and backends
extracted metadata fields
charset extraction (if applicable)
diagnostic output and logging behavior

are documented in:

→ docs/tools/dump-fonts.md

Role in the pipeline of stage 1¶

The system font dump stage:

enumerates available fonts
extracts raw metadata
does not perform semantic interpretation
does not apply inventory-level validation or enrichment
does not generate catalog artifacts

The data produced here is consumed by the inventory parsing stage.

Stage 2 — Inventory parsing, validation and enrichment¶

This stage transforms raw font metadata into a structured and validated inventory representation.

It is responsible for:

schema validation
normalization of extracted metadata
semantic validation
enforcement of strict or permissive validation modes

Documentation of stage 2¶

All operational details for inventory parsing and validation are documented in:

→ docs/tools/parse-inventory.md

This includes:

language normalization rules
strict vs permissive validation behavior
handling of deprecated or malformed data
validation error semantics

Role in the pipeline of stage 2¶

The inventory parsing stage:

consumes raw font metadata
produces a validated inventory representation
applies semantic and structural checks
does not perform font discovery
does not generate output artifacts

All subsequent stages operate on the validated inventory produced here.

Stage 3 — Catalog generation¶

The catalog generation stage transforms the validated inventory into final output artifacts.

This stage is responsible for producing user-facing representations based on the processed inventory data.

Documentation of stage 3¶

All implementation details related to catalog generation, including:

output formats
LaTeX generation
template handling
error reporting and diagnostics

are documented in:

→ docs/tools/create-catalog.md

Role in the pipeline of stage 3¶

The catalog generation stage:

consumes validated inventory data
produces final output artifacts
does not perform validation or normalization
does not modify inventory contents

This is the terminal stage of the Fontshow pipeline.

Stage 4 — LaTeX compilation¶

The final catalog is compiled using LuaLaTeX.

Although LuaLaTeX may require multiple compilation passes to resolve indices and auxiliary constructs, this process is treated as a single logical stage in the pipeline.

Failures at this stage may be caused by:

missing or incomplete LaTeX toolchains,
font rendering issues,
environment mismatches between discovery and compilation.

Pipeline artifacts¶

The pipeline produces several intermediate artifacts, including:

font dumps;
inventories;
intermediate JSON files;
final LaTeX files.

These artifacts:

are not merely temporary outputs;
can be used to compare different systems;
facilitate testing, debugging, and validation.

Environment considerations¶

Pipeline behavior may vary depending on the environment:

native Linux;
WSL;
fontconfig configuration.

For this reason:

some features are marked as experimental;
full validation on native Linux is considered a required step.

Links¶

For further details on individual components:

General architecture: architecture.md
Data dictionary: data_dictionary.md
Font dump: dump-fonts
Inventory parsing: parse-inventory
Catalog creation: create-catalog

Pipeline status¶

The pipeline is considered functionally complete, but still evolving with respect to:

robustness across different environments;
automated testing;
handling of edge cases.

Open activities are tracked via GitHub Issues.