Skip to content

Decision 0022 - Fontshow — Specimen Inference Inputs Matrix v1.0

Date: 02/03/2026 Status: Accepted Type: AUTHORITATIVE DESIGN CONTRACT Applies to: schema ≥ v1.2 Phases: dump-fonts → parse-inventory


1. Architectural Principle

Fontshow follows a strict two-phase pipeline:

    Discovery Phase  (dump-fonts)
        ↓ produces IR (font_inventory.json)
    Interpretation Phase (parse-inventory)
        ↓ produces enriched IR

Hard Rule

parse-inventory MUST operate as:

    JSON → JSON (pure transformation)

and MUST NOT access font files.

All font-dependent data must be collected during dump-fonts.


2. Terminology

Term Meaning
Signal Raw observable fact extracted from font
Inference Semantic interpretation derived from signals
Specimen Representative text shown for catalog display

3. Specimen Generation Responsibilities

dump-fonts (Discovery)

MUST collect signals only.

Allowed:

  • Read font files
  • Extract raw metadata
  • Export coverage information

Forbidden:

  • Language inference
  • Script inference
  • Specimen selection
  • Heuristic interpretation

parse-inventory (Interpretation)

MUST:

  • derive specimen deterministically
  • rely exclusively on inventory JSON
  • never open font files

4. Required Signals (Dump Output)

These fields MUST exist or be derivable from dump output.

4.1 Unicode Coverage (existing)

"unicode_blocks": {...}

Purpose:

  • script inference
  • language inference

4.2 Charset Ranges (NEW — REQUIRED)

Compact synthesis of cmap coverage.

"charset_ranges": [
  [32,126],
  [160,255]
]

Definition:

    charset_ranges := merged contiguous Unicode codepoint ranges
                      present in cmap tables.

Properties:

  • lossless for coverage queries
  • deterministic
  • small storage footprint
  • replaces cmap reopening

4.3 Glyph Count (NEW — REQUIRED)

"glyph_count": 1342

Definition: Total glyphs in font.

Used for:

  • specimen sizing heuristics
  • density estimation
  • catalog diagnostics

No semantic interpretation implied.


4.4 Sample Text (existing)

"sample_text": ["Example text"]

Meaning: Embedded font-provided specimen.

Highest authority signal.


5. Specimen Inference Decision Tree

Performed ONLY in parse-inventory. Specimen inference is deferred from dump-fonts to parse-inventory. Schema extended with "deferred" specimen_strategy state.

Step 1 — Internal specimen

IF sample_text exists:

specimen_text       := sample_text
specimen_strategy   := "internal"
specimen_glyph_count := glyph_count(sample_text)
STOP

Step 2 — Script-aware synthesis

Inputs:

  • charset_ranges
  • inferred scripts
  • unicode_blocks

Algorithm:

alphabet := canonical script alphabet
available := alphabet ∩ charset_ranges
specimen := deterministic subset(available)

Output:

specimen_strategy := "script"

Step 3 — Coverage fallback

If script synthesis insufficient:

specimen := representative characters
            selected from available coverage

Output:

specimen_strategy := "coverage"

Step 4 — Minimal fallback

If only sparse coverage exists:

specimen := smallest valid printable subset

Output:

specimen_strategy := "minimal"

6. Explicit Non-Goals

Specimen generation MUST NOT:

  • reject numeric-only fonts
  • assume alphabetic usage
  • enforce typographic expectations
  • modify discovery data

Catalog is descriptive, not prescriptive.


7. Determinism Requirements

Given identical input inventory:

  • specimen_text MUST be identical
  • ordering MUST be stable
  • no randomness allowed
  • no environment dependence

8. Performance Guarantees

After compliance:

parse-inventory complexity:

    O(number_of_fonts)

with NO font IO.

Expected runtime reduction:

    ~20–60s → ~2–4s on large systems.

9. Regression Guard

Any future change introducing:

    TTFont(...)
    fontTools access
    filesystem reads

inside parse-inventory is a CONTRACT VIOLATION.


END OF CONTRACT