Decision 0022 - Fontshow — Specimen Inference Inputs Matrix v1.0¶
Date: 02/03/2026 Status: Accepted Type: AUTHORITATIVE DESIGN CONTRACT Applies to: schema ≥ v1.2 Phases: dump-fonts → parse-inventory
1. Architectural Principle¶
Fontshow follows a strict two-phase pipeline:
Discovery Phase (dump-fonts)
↓ produces IR (font_inventory.json)
Interpretation Phase (parse-inventory)
↓ produces enriched IR
Hard Rule¶
parse-inventory MUST operate as:
JSON → JSON (pure transformation)
and MUST NOT access font files.
All font-dependent data must be collected during dump-fonts.
2. Terminology¶
| Term | Meaning |
|---|---|
| Signal | Raw observable fact extracted from font |
| Inference | Semantic interpretation derived from signals |
| Specimen | Representative text shown for catalog display |
3. Specimen Generation Responsibilities¶
dump-fonts (Discovery)¶
MUST collect signals only.
Allowed:
- Read font files
- Extract raw metadata
- Export coverage information
Forbidden:
- Language inference
- Script inference
- Specimen selection
- Heuristic interpretation
parse-inventory (Interpretation)¶
MUST:
- derive specimen deterministically
- rely exclusively on inventory JSON
- never open font files
4. Required Signals (Dump Output)¶
These fields MUST exist or be derivable from dump output.
4.1 Unicode Coverage (existing)¶
"unicode_blocks": {...}
Purpose:
- script inference
- language inference
4.2 Charset Ranges (NEW — REQUIRED)¶
Compact synthesis of cmap coverage.
"charset_ranges": [
[32,126],
[160,255]
]
Definition:
charset_ranges := merged contiguous Unicode codepoint ranges
present in cmap tables.
Properties:
- lossless for coverage queries
- deterministic
- small storage footprint
- replaces cmap reopening
4.3 Glyph Count (NEW — REQUIRED)¶
"glyph_count": 1342
Definition: Total glyphs in font.
Used for:
- specimen sizing heuristics
- density estimation
- catalog diagnostics
No semantic interpretation implied.
4.4 Sample Text (existing)¶
"sample_text": ["Example text"]
Meaning: Embedded font-provided specimen.
Highest authority signal.
5. Specimen Inference Decision Tree¶
Performed ONLY in parse-inventory. Specimen inference is deferred from dump-fonts to parse-inventory. Schema extended with "deferred" specimen_strategy state.
Step 1 — Internal specimen¶
IF sample_text exists:
specimen_text := sample_text
specimen_strategy := "internal"
specimen_glyph_count := glyph_count(sample_text)
STOP
Step 2 — Script-aware synthesis¶
Inputs:
- charset_ranges
- inferred scripts
- unicode_blocks
Algorithm:
alphabet := canonical script alphabet
available := alphabet ∩ charset_ranges
specimen := deterministic subset(available)
Output:
specimen_strategy := "script"
Step 3 — Coverage fallback¶
If script synthesis insufficient:
specimen := representative characters
selected from available coverage
Output:
specimen_strategy := "coverage"
Step 4 — Minimal fallback¶
If only sparse coverage exists:
specimen := smallest valid printable subset
Output:
specimen_strategy := "minimal"
6. Explicit Non-Goals¶
Specimen generation MUST NOT:
- reject numeric-only fonts
- assume alphabetic usage
- enforce typographic expectations
- modify discovery data
Catalog is descriptive, not prescriptive.
7. Determinism Requirements¶
Given identical input inventory:
- specimen_text MUST be identical
- ordering MUST be stable
- no randomness allowed
- no environment dependence
8. Performance Guarantees¶
After compliance:
parse-inventory complexity:
O(number_of_fonts)
with NO font IO.
Expected runtime reduction:
~20–60s → ~2–4s on large systems.
9. Regression Guard¶
Any future change introducing:
TTFont(...)
fontTools access
filesystem reads
inside parse-inventory is a CONTRACT VIOLATION.
END OF CONTRACT