Data Dictionary¶
This document defines the normative JSON schema used by Fontshow. All inventories generated or processed by Fontshow tools MUST conform to the structures described here.
The dictionary describes:
- extracted metadata
- pass-through metadata
- inferred / derived metadata
Top-level structure¶
{
"metadata": { ... },
"fonts": [ ... ]
}
metadata¶
Metadata about how and when the inventory was generated or enriched.
Fields¶
-
schema_version(string) Inventory schema version. -
input_inventory_tool(string) Name of the tool that produced the current inventory payload. -
input_inventory_tool_version(string) Fontshow version used by the producing tool. -
inference_level(string) Inference strategy used when enriching the inventory. -
fonttools(object) available(boolean)fontconfig_charset_included(boolean)-
version(string) -
run_environment(object) hostname(string)username(string)os(string)os_release(string)kernel(string)machine(string)python_version(string)platform(string)execution_context(string, e.g.native,wsl,container,other)
fonts¶
List of font entries. Each entry represents a single normalized font face descriptor emitted by the inventory pipeline.
Font entry¶
identity¶
Font identity and naming metadata.
-
file(string) Canonical path to the font file. -
ttc_index(integer or null) -
family(string or null) -
style(string or null) -
fullname(string or null) -
postscript_name(string or null) -
id(string) Stable internal identifier.
platform¶
name(string) Platform where the font was discovered (e.g.linux,windows).
format¶
Font container and format classification.
container(string, e.g.TTF,OTF,TTC)font_type(string)ttc_index(integer or null)ttc_count(integer or null)variable(boolean)color(boolean)decorative(boolean)
coverage¶
Raw and derived coverage metadata. Coverage fields are descriptive and diagnostic; no inference or semantic decisions are performed at this level.
unicodecount(integer)min(integer)-
max(integer) -
unicode_blocks(object) Mapping of Unicode block names to approximate coverage metrics. This data is typically extracted from external tools or inventory sources. -
scripts(list of strings) Script tags declared by FontConfig (ISO 15924 / OpenType). -
languages(list of strings) Language tags declared by FontConfig (lang:), BCP-47 style. -
charset(object or null) Raw FontConfig charset ranges, when enabled at extraction time. This data is file-level, best-effort, and not normalized.
Charset-derived coverage (diagnostic)¶
The following fields are derived from FontConfig charset data when available. They are diagnostic only and do not affect inference logic.
-
normalized_charset(object) Deterministic normalization of raw charset ranges. Ranges are sorted and merged, and a total codepoint count is computed. -
ranges(list of[start, end]integer pairs) -
codepoints_count(integer) -
unicode_blocks_from_charset(object) Mapping of Unicode block names to the number of codepoints covered by the normalized charset. This field does not replacecoverage.unicode_blocks. -
script_coverage_from_charset(object) Mapping of script tags (ISO 15924) to estimated coverage ratios (floating-point values between 0.0 and 1.0). This data is informational and non-authoritative.
typography¶
Typography-related metadata.
weight_class(integer)width_class(integer)opentype_features(list of strings)
classification¶
High-level font classification flags.
is_variable(boolean)is_color(boolean)is_decorative(boolean)is_emoji(boolean)container(string)font_type(string)
license¶
Font license information.
text(string or null)url(string or null)
vendor¶
vendor(string or null)
embedding_rights¶
embedding_rights(integer)
sample_text¶
sample_text(string or null) Optional sample text extracted from the font.
source¶
Extraction diagnostics.
fonttoolsok(boolean)-
error(string or null) -
fontconfig ok(boolean)
inference¶
Derived metadata computed by parse-inventory.
Inference is deterministic and reproducible.
-
level(string) -
scripts(list of strings) Inferred ISO 15924 scripts. -
languages(list of strings) Languages inferred from scripts. -
declared_scripts(list of strings) Raw scripts copied fromcoverage.scripts. -
declared_languages(list of strings) Raw languages copied fromcoverage.languages. -
unicode_blocks(object) Unicode blocks reused for inference diagnostics.
fonts[].inference.scripts¶
Type: array[string]
Required: no
Values: ISO 15924 script codes (lowercase)
Example: ["latn"], ["cyrl", "grek"], ["jpan"]
This field contains the list of writing systems inferred for a font, expressed using ISO 15924 script codes.
Derivation rules¶
The value is derived by parse-inventory using a best-effort strategy
based on Unicode coverage metadata:
-
Primary source:
coverage.unicode_blocksIf available, Unicode block usage statistics are analyzed to infer scripts. Only blocks with significant coverage are considered. -
Fallback source:
coverage.unicode.maxIf block-level data is unavailable, the maximum Unicode code point supported by the font is used as a heuristic indicator. -
Normalization All inferred scripts are normalized to ISO 15924 codes:
Latin→latnGreek→grekCyrillic→cyrlArabic→arabHebrew→hebrDevanagari→deva-
CJK disambiguation:
- Han only →
hani - Han + Japanese kana →
jpan - Han + Hangul →
hang
- Han only →
-
Unknown If no reliable inference is possible, the value defaults to:
["unknown"]
Notes¶
- This field represents inferred information and may differ from language declarations.
- The absence of this field does not invalidate a font entry.
- Downstream tools must treat
"unknown"as a valid placeholder.
Overall Notes¶
coverage.*fields are never modified by inference.inference.*fields may evolve as inference logic improves.- Consumers SHOULD rely on
inferencerather thancoverageunless raw metadata is explicitly required.