Templates
Templates are YAML files that declare how to extract structured data from an RDF graph. All CDM-specific knowledge lives in templates, not in Python code.
Structure
A template has these top-level sections:
version: "1"
prefixes:
cdm: "http://publications.europa.eu/ontology/cdm#"
skos: "http://www.w3.org/2004/02/skos/core#"
languages:
preferred: ["en", "fr", "de"]
fallback: "any"
same_as_merge: true # optional, default: true
entities:
procedure:
find:
type: "cdm:procedure_interinstitutional"
include_subclasses: true
fields:
title:
predicate:
- "cdm:title"
- "cdm:dossier_title"
multilingual: true
relations:
events:
predicate:
- "cdm:dossier_contains_event_legal"
- "cdm:dossier_contains_event"
target_template: "event"
cardinality: "many"
Prefixes
Maps short prefixes to full namespace URIs. All predicate references in the template use prefixed form (cdm:title instead of the full URI).
Languages
Controls multilingual field resolution.
preferred: Ordered list of preferred language codes. All languages present in the data are returned; the preferred list only affects fallback ordering.fallback:"any"includes untagged literals under the"_"key."none"skips them.
same_as_merge
Controls owl:sameAs entity merging (default: true). When enabled, instances linked by owl:sameAs are grouped into equivalence classes and their triples are merged into a single output entity. The canonical URI is selected by preferring resource/procedure/ over pegase/cellar URIs. All alias URIs are listed in the _same_as metadata field.
Set to false if you want each URI extracted as a separate entity (e.g., for debugging or for non-CDM data where owl:sameAs has different semantics).
This setting can be overridden per call via extract(..., merge_same_as=False).
Entities
Each entity has three parts:
find: How to discover instances in the graph (typeURI, optionalinclude_subclasses).fields: Scalar or multilingual values to extract.relations: Links to other entities for nested extraction.
The first entity in the template is the root entity, extracted by default.
Field options
| Option | Default | Description |
|---|---|---|
predicate |
required | Prefixed URI, wildcard (cdm:date_*), or list of aliases |
multilingual |
false |
Return a language-keyed dict instead of a scalar |
cardinality |
"one" |
"one" (first match wins) or "many" (collect all matches) |
collect |
null |
"dict" to collect wildcard matches as key-value pairs |
direction |
"forward" |
"forward" (subject -> object) or "inverse" (object -> subject) |
datatype |
null |
XSD datatype hint (e.g., "xsd:date") |
follow |
null |
One-hop traversal for label resolution (see below) |
exclude |
[] |
Predicates to skip in wildcard matches |
required |
false |
Log a warning if this field is missing |
transform |
null |
Named transform to apply to values (see below) |
Relation options
| Option | Default | Description |
|---|---|---|
predicate |
required | Prefixed URI or list of aliases |
target_template |
null |
Entity name for recursive nested extraction |
cardinality |
"many" |
"one" or "many" |
direction |
"forward" |
"forward" or "inverse" |
inverse_predicate |
[] |
Additional predicates for reverse lookup (object -> subject), with owl:sameAs alias expansion |
transform |
null |
Named transform to apply to values |
Predicate aliasing
CDM encodes the same fact under multiple predicates simultaneously (see CDM Patterns). Templates handle this with predicate lists:
date:
predicate:
- "cdm:event_legal_date" # fully qualified
- "cdm:event_date" # entity-prefixed
- "cdm:date" # short form
datatype: "xsd:date"
For cardinality: "one", the first alias that produces data wins. For cardinality: "many", results from all aliases are merged (deduplicated).
Wildcard fields
Wildcards match multiple predicates using fnmatch patterns:
dates:
predicate: "cdm:date_*"
collect: "dict"
This collects all predicates matching cdm:date_* into a dict keyed by local name:
{"date": "2019-12-11", "date_adopted": "2021-06-09"}
Use exclude to skip specific predicates from a wildcard match:
other_properties:
predicate: "cdm:event_legal_*"
collect: "dict"
exclude:
- "cdm:event_legal_date"
- "cdm:event_legal_type"
Follow (one-hop traversal)
The follow option resolves a value by following one additional predicate. Useful for getting labels from concept URIs:
resource_type_label:
predicate: "cdm:work_has_resource-type"
follow:
predicate: "skos:prefLabel"
multilingual: true
This first gets the concept URI via cdm:work_has_resource-type, then follows skos:prefLabel on that concept to get the human-readable label.
Inverse predicates on relations
Some entities point back to their parent rather than the parent pointing to them. The inverse_predicate option handles this by looking for nodes that point TO the current entity:
events:
predicate:
- "cdm:dossier_contains_event_legal"
inverse_predicate:
- "cdm:event_legal_part_of_dossier"
target_template: "event"
cardinality: "many"
Inverse predicate lookup automatically expands owl:sameAs aliases, handling CDM's multiple-URI-per-entity pattern.
Transforms
The transform option applies a named function to extracted values:
procedure_type:
predicate: "cdm:has_type"
transform: "uri_local_name"
Built-in transforms:
year_from_date:"2019-12-11"->"2019"uri_local_name:"http://.../concept/COD"->"COD"
Custom transforms are passed at extraction time:
results = extract(g, template="my_template", transforms={
"strip_prefix": lambda v: v.removeprefix("http://example.org/"),
})
Writing a custom template
A minimal custom template:
version: "1"
prefixes:
cdm: "http://publications.europa.eu/ontology/cdm#"
languages:
preferred: ["en"]
fallback: "any"
entities:
my_entity:
find:
type: "cdm:work"
include_subclasses: true
fields:
title:
predicate: "cdm:work_title"
multilingual: true
date:
predicate: "cdm:work_date_document"
datatype: "xsd:date"
relations: {}
Save as my_template.yaml and use it:
results = extract(g, template="my_template.yaml")
Built-in templates
| Template | Root entity | Description |
|---|---|---|
eu_procedure |
procedure | Legislative procedures with events and documents |
eu_document |
document | Documents, expressions, and manifestations |
Use list_builtin_templates() to see the current list.