Decision 0025 - Drive Script Inference from Ontology Data¶

Date: 16/03/2026 Status: Accepted

Context¶

Script inference previously depended on hard-coded tables inside fontshow.inventory.script_analysis. Adding support for a new writing system required editing both ontology data and inference code.

That structure had three problems:

the ontology already contained most of the information needed to describe script evidence and representative languages;
support additions were not data-only, which made incremental expansion error-prone;
special cases such as Japanese collapse, broad-neighbor scripts, and unicode.max fallbacks were not documented in the authoritative ontology.

The project goal for this phase is to make script support expansion primarily a data-maintenance task: new scripts and languages should be added by extending ontology rows, not by patching the inference engine.

Decision¶

Script inference is now driven by ontology metadata stored in SCRIPT_INFO, with language fallback metadata normalized in LANGUAGE_INFO.

The ontology schema is extended as follows:

LANGUAGE_INFO adds primary_script
SCRIPT_INFO adds:
required_blocks
optional_blocks
suppresses
inference_priority
unicode_max_ranges
block_match
collapse_group
preferred_over

The runtime model is:

language rows declare their primary script explicitly;
script rows expose the data needed for block-based and unicode.max-based inference;
script_analysis evaluates scripts generically from ontology data;
collapse and precedence behavior are expressed by ontology fields rather than hard-coded script-name tables.

Defaults are normalized at module load time:

primary_script is backfilled from the first script in each language profile;
many script inference defaults are derived from the representative language profile;
scripts with special disambiguation behavior override those defaults explicitly.

JPAN is introduced as a first-class ontology script so Japanese collapse behavior is represented in production data rather than in inference code comments or ad hoc special cases.

Consequences¶

Positive:

adding a new script now mostly means updating ontology rows;
script inference rules live next to the script metadata they affect;
rendering, specimen selection, and inference now share one authoritative script description;
Japanese collapse and broad-neighbor precedence are explicit and reviewable.

Trade-offs:

ontology rows are richer and therefore more demanding to curate;
module-load normalization introduces a small amount of derived data logic in language_tables.py;
preflight validation must enforce the expanded schema to keep the ontology trustworthy.

Operationally, the quality gates for this decision are:

ruff check .
mypy .
pytest -q

These gates passed when this decision was adopted.