Decision 0013 - Language normalization strategy¶

Date: 25/01/2026 Status: Accepted

Context¶

Fontshow extracts language information from multiple sources, most notably Fontconfig. These sources provide language tags that:

At the same time, downstream consumers of the inventory require a stable, normalized representation of supported languages.

Fontshow distinguishes between two language-related fields:

Normalization is performed by a dedicated procedural function: normalize_languages().

The following rules are applied in order:

During normalization, entries may be discarded. Each discarded value is recorded with a reason:

Reason	Meaning
`invalid_format`	Invalid or empty input
`unknown_language`	Not a valid ISO language
`variant_stripped`	Regional or annotated variant removed
`duplicate`	Duplicate after normalization

The normalization function returns both:

Normalization does not perform logging.

Logging is handled by the caller (typically parse_inventory), which may emit warnings based on the returned dropped entries.

This separation ensures:

This design explicitly does NOT:

All such logic belongs to higher-level inference steps.