Fields and Codebooks
openstage provides a field metadata system for annotating case-specific model properties with structured information: variable types, controlled vocabularies, data source provenance, and human-readable labels. These annotations power codebook generation and soft validation.
Field metadata system
Case models use factory functions from openstage.models.fields to declare fields with rich metadata. Each function produces a Pydantic FieldInfo with x_ prefixed custom annotations in json_schema_extra.
from openstage.models.fields import nominal_field, id_field, date_field, text_field
class MyProcedure(Procedure):
procedure_type: str | None = nominal_field(
description="Type of legislative procedure.",
label="Procedure type",
source="my_source:procedure_type",
known_values={
"OLP": "Ordinary Legislative Procedure",
"CNS": "Consultation procedure",
},
missing_means="Procedure type not recorded.",
default=None,
)
Factory functions
| Function | Variable type | Use for |
|---|---|---|
text_field() |
text | Semantic human-readable content |
nominal_field() |
nominal | Categorical values from a controlled vocabulary |
id_field() |
identifier | Structured reference strings (URIs, numbers) |
date_field() |
date | ISO 8601 date strings |
Metadata annotations
Each factory function accepts these parameters, stored as x_ prefixed keys:
x_variable_type-- one of text, nominal, identifier, datex_label-- human-readable field labelx_source-- data source provenance (e.g., a CDM property URI)x_known_values-- controlled vocabulary dict (nominal fields only)x_missing_means-- what a missing value means substantively
Soft validation
Fields that declare x_known_values are soft-validated on construction via warn_unknown_values(). Unknown values produce a UserWarning rather than raising an exception, so unexpected data is flagged without blocking processing.
Codebook extraction
The field metadata embedded in case models can be extracted as a structured codebook for documentation or data dictionaries.
from openstage.models.codebook import extract_codebook, codebook_to_markdown
from openstage.models.eu import EUProcedure
entries = extract_codebook(EUProcedure)
print(codebook_to_markdown(entries))
Each codebook entry includes:
name-- field nametype-- JSON Schema type stringdescription-- field descriptionrequired-- whether the field is requiredinherited_from-- base class name if inherited, None if own fieldx_variable_type,x_label,x_source,x_known_values,x_missing_means-- field metadata annotations
codebook_to_markdown() renders a field table with an appendix listing controlled vocabularies.
API reference
Field helpers
Field metadata helpers for openstage models.
Provides factory functions that return Pydantic FieldInfo with structured metadata using x_ prefixed custom annotations (JSON Schema convention for custom extensions). These annotations power codebook generation and soft validation of controlled vocabularies.
Variable types:
- text: semantic human-readable content, may be multilingual
- nominal: categorical value from a controlled vocabulary
- identifier: structured reference string (URIs, CELEX numbers)
- date: ISO 8601 date string
text_field(description, label=None, source=None, missing_means=None, **kwargs)
Field for semantic human-readable content.
nominal_field(description, label=None, source=None, known_values=None, missing_means=None, **kwargs)
Field for categorical values from a controlled vocabulary.
id_field(description, label=None, source=None, missing_means=None, **kwargs)
Field for structured reference strings (URIs, CELEX numbers).
date_field(description, label=None, source=None, missing_means=None, **kwargs)
Field for ISO 8601 date strings.
warn_unknown_values(instance)
Emit warnings for nominal field values not in their known_values set.
Inspects the model's declared fields for x_known_values metadata. If a field has a value that is not in the known set, emits a UserWarning. Never raises exceptions.
Codebook
Codebook extraction from openstage model schemas.
Walks Pydantic model_json_schema() output and produces a flat list of field descriptors with x_ metadata annotations. Inherited fields are marked with their source model so documentation generators can link back.
extract_codebook(model_class)
Extract codebook entries from a Pydantic model class.
Returns a list of dicts, one per field, with keys:
name: field nametype: JSON Schema type stringdescription: field descriptionrequired: whether the field is requiredinherited_from: base class name if inherited, None if own fieldx_variable_type: variable type annotation (if present)x_label: human-readable label (if present)x_source: data source annotation (if present)x_known_values: controlled vocabulary dict (if present)x_missing_means: meaning of missing values (if present)
codebook_to_markdown(entries)
Render codebook entries as a markdown document.
Produces a field table followed by an appendix listing controlled
vocabularies for fields that have x_known_values.