Fields and Codebooks

openstage provides a field metadata system for annotating case-specific model properties with structured information: variable types, controlled vocabularies, data source provenance, and human-readable labels. These annotations power codebook generation and soft validation.

Field metadata system

Case models use factory functions from openstage.models.fields to declare fields with rich metadata. Each function produces a Pydantic FieldInfo with x_ prefixed custom annotations in json_schema_extra.

from openstage.models.fields import nominal_field, id_field, date_field, text_field

class MyProcedure(Procedure):
    procedure_type: str | None = nominal_field(
        description="Type of legislative procedure.",
        label="Procedure type",
        source="my_source:procedure_type",
        known_values={
            "OLP": "Ordinary Legislative Procedure",
            "CNS": "Consultation procedure",
        },
        missing_means="Procedure type not recorded.",
        default=None,
    )

Factory functions

Function Variable type Use for
text_field() text Semantic human-readable content
nominal_field() nominal Categorical values from a controlled vocabulary
id_field() identifier Structured reference strings (URIs, numbers)
date_field() date ISO 8601 date strings

Metadata annotations

Each factory function accepts these parameters, stored as x_ prefixed keys:

  • x_variable_type -- one of text, nominal, identifier, date
  • x_label -- human-readable field label
  • x_source -- data source provenance (e.g., a CDM property URI)
  • x_known_values -- controlled vocabulary dict (nominal fields only)
  • x_missing_means -- what a missing value means substantively

Soft validation

Fields that declare x_known_values are soft-validated on construction via warn_unknown_values(). Unknown values produce a UserWarning rather than raising an exception, so unexpected data is flagged without blocking processing.

Codebook extraction

The field metadata embedded in case models can be extracted as a structured codebook for documentation or data dictionaries.

from openstage.models.codebook import extract_codebook, codebook_to_markdown
from openstage.models.eu import EUProcedure

entries = extract_codebook(EUProcedure)
print(codebook_to_markdown(entries))

Each codebook entry includes:

  • name -- field name
  • type -- JSON Schema type string
  • description -- field description
  • required -- whether the field is required
  • inherited_from -- base class name if inherited, None if own field
  • x_variable_type, x_label, x_source, x_known_values, x_missing_means -- field metadata annotations

codebook_to_markdown() renders a field table with an appendix listing controlled vocabularies.

API reference

Field helpers

Field metadata helpers for openstage models.

Provides factory functions that return Pydantic FieldInfo with structured metadata using x_ prefixed custom annotations (JSON Schema convention for custom extensions). These annotations power codebook generation and soft validation of controlled vocabularies.

Variable types:

  • text: semantic human-readable content, may be multilingual
  • nominal: categorical value from a controlled vocabulary
  • identifier: structured reference string (URIs, CELEX numbers)
  • date: ISO 8601 date string

text_field(description, label=None, source=None, missing_means=None, **kwargs)

Field for semantic human-readable content.

nominal_field(description, label=None, source=None, known_values=None, missing_means=None, **kwargs)

Field for categorical values from a controlled vocabulary.

id_field(description, label=None, source=None, missing_means=None, **kwargs)

Field for structured reference strings (URIs, CELEX numbers).

date_field(description, label=None, source=None, missing_means=None, **kwargs)

Field for ISO 8601 date strings.

warn_unknown_values(instance)

Emit warnings for nominal field values not in their known_values set.

Inspects the model's declared fields for x_known_values metadata. If a field has a value that is not in the known set, emits a UserWarning. Never raises exceptions.

Codebook

Codebook extraction from openstage model schemas.

Walks Pydantic model_json_schema() output and produces a flat list of field descriptors with x_ metadata annotations. Inherited fields are marked with their source model so documentation generators can link back.

extract_codebook(model_class)

Extract codebook entries from a Pydantic model class.

Returns a list of dicts, one per field, with keys:

  • name: field name
  • type: JSON Schema type string
  • description: field description
  • required: whether the field is required
  • inherited_from: base class name if inherited, None if own field
  • x_variable_type: variable type annotation (if present)
  • x_label: human-readable label (if present)
  • x_source: data source annotation (if present)
  • x_known_values: controlled vocabulary dict (if present)
  • x_missing_means: meaning of missing values (if present)

codebook_to_markdown(entries)

Render codebook entries as a markdown document.

Produces a field table followed by an appendix listing controlled vocabularies for fields that have x_known_values.