Templates

Templates are YAML files that declare how to extract structured data from an RDF graph. All CDM-specific knowledge lives in templates, not in Python code.

Structure

A template has these top-level sections:

version: "1"

prefixes:
  cdm: "http://publications.europa.eu/ontology/cdm#"
  skos: "http://www.w3.org/2004/02/skos/core#"

languages:
  preferred: ["en", "fr", "de"]
  fallback: "any"

same_as_merge: true   # optional, default: true

entities:
  procedure:
    find:
      type: "cdm:procedure_interinstitutional"
      include_subclasses: true

    fields:
      title:
        predicate:
          - "cdm:title"
          - "cdm:dossier_title"
        multilingual: true

    relations:
      events:
        predicate:
          - "cdm:dossier_contains_event_legal"
          - "cdm:dossier_contains_event"
        target_template: "event"
        cardinality: "many"

Prefixes

Maps short prefixes to full namespace URIs. All predicate references in the template use prefixed form (cdm:title instead of the full URI).

Languages

Controls multilingual field resolution.

preferred: Ordered list of preferred language codes. All languages present in the data are returned; the preferred list only affects fallback ordering.
fallback: "any" includes untagged literals under the "_" key. "none" skips them.

same_as_merge

Controls owl:sameAs entity merging (default: true). When enabled, instances linked by owl:sameAs are grouped into equivalence classes and their triples are merged into a single output entity. The canonical URI is selected by preferring resource/procedure/ over pegase/cellar URIs. All alias URIs are listed in the _same_as metadata field.

Set to false if you want each URI extracted as a separate entity (e.g., for debugging or for non-CDM data where owl:sameAs has different semantics).

This setting can be overridden per call via extract(..., merge_same_as=False).

Entities

Each entity has three parts:

find: How to discover instances in the graph (type URI, optional include_subclasses).
fields: Scalar or multilingual values to extract.
relations: Links to other entities for nested extraction.

The first entity in the template is the root entity, extracted by default.

Field options

Option	Default	Description
`predicate`	required	Prefixed URI, wildcard (`cdm:date_*`), or list of aliases
`multilingual`	`false`	Return a language-keyed dict instead of a scalar
`cardinality`	`"one"`	`"one"` (first match wins) or `"many"` (collect all matches)
`collect`	`null`	`"dict"` to collect wildcard matches as key-value pairs
`direction`	`"forward"`	`"forward"` (subject -> object) or `"inverse"` (object -> subject)
`datatype`	`null`	XSD datatype hint (e.g., `"xsd:date"`)
`follow`	`null`	One-hop traversal for label resolution (see below)
`exclude`	`[]`	Predicates to skip in wildcard matches
`required`	`false`	Log a warning if this field is missing
`transform`	`null`	Named transform to apply to values (see below)

Relation options

Option	Default	Description
`predicate`	required	Prefixed URI or list of aliases
`target_template`	`null`	Entity name for recursive nested extraction
`cardinality`	`"many"`	`"one"` or `"many"`
`direction`	`"forward"`	`"forward"` or `"inverse"`
`inverse_predicate`	`[]`	Additional predicates for reverse lookup (object -> subject), with `owl:sameAs` alias expansion
`transform`	`null`	Named transform to apply to values

Predicate aliasing

CDM encodes the same fact under multiple predicates simultaneously (see CDM Patterns). Templates handle this with predicate lists:

date:
  predicate:
    - "cdm:event_legal_date"   # fully qualified
    - "cdm:event_date"         # entity-prefixed
    - "cdm:date"               # short form
  datatype: "xsd:date"

For cardinality: "one", the first alias that produces data wins. For cardinality: "many", results from all aliases are merged (deduplicated).

Wildcard fields

Wildcards match multiple predicates using fnmatch patterns:

dates:
  predicate: "cdm:date_*"
  collect: "dict"

This collects all predicates matching cdm:date_* into a dict keyed by local name:

{"date": "2019-12-11", "date_adopted": "2021-06-09"}

Use exclude to skip specific predicates from a wildcard match:

other_properties:
  predicate: "cdm:event_legal_*"
  collect: "dict"
  exclude:
    - "cdm:event_legal_date"
    - "cdm:event_legal_type"

Follow (one-hop traversal)

The follow option resolves a value by following one additional predicate. Useful for getting labels from concept URIs:

resource_type_label:
  predicate: "cdm:work_has_resource-type"
  follow:
    predicate: "skos:prefLabel"
    multilingual: true

This first gets the concept URI via cdm:work_has_resource-type, then follows skos:prefLabel on that concept to get the human-readable label.

Inverse predicates on relations

Some entities point back to their parent rather than the parent pointing to them. The inverse_predicate option handles this by looking for nodes that point TO the current entity:

events:
  predicate:
    - "cdm:dossier_contains_event_legal"
  inverse_predicate:
    - "cdm:event_legal_part_of_dossier"
  target_template: "event"
  cardinality: "many"

Inverse predicate lookup automatically expands owl:sameAs aliases, handling CDM's multiple-URI-per-entity pattern.

Transforms

The transform option applies a named function to extracted values:

procedure_type:
  predicate: "cdm:has_type"
  transform: "uri_local_name"

Built-in transforms:

year_from_date: "2019-12-11" -> "2019"
uri_local_name: "http://.../concept/COD" -> "COD"

Custom transforms are passed at extraction time:

results = extract(g, template="my_template", transforms={
    "strip_prefix": lambda v: v.removeprefix("http://example.org/"),
})

Writing a custom template

A minimal custom template:

version: "1"

prefixes:
  cdm: "http://publications.europa.eu/ontology/cdm#"

languages:
  preferred: ["en"]
  fallback: "any"

entities:
  my_entity:
    find:
      type: "cdm:work"
      include_subclasses: true

    fields:
      title:
        predicate: "cdm:work_title"
        multilingual: true
      date:
        predicate: "cdm:work_date_document"
        datatype: "xsd:date"

    relations: {}

Save as my_template.yaml and use it:

results = extract(g, template="my_template.yaml")

Built-in templates

Template	Root entity	Description
`eu_procedure`	procedure	Legislative procedures with events and documents
`eu_document`	document	Documents, expressions, and manifestations

Use list_builtin_templates() to see the current list.