backstage

backstage is the data collection and processing pipeline for the openstage project. It collects legislative procedure data from official sources, processes it into structured formats, and publishes research-ready datasets.

What backstage does

backstage handles the operational side of data collection: querying official endpoints, downloading raw data, parsing it into typed openstage models, and publishing packaged datasets to Harvard Dataverse.

flowchart LR A[Source endpoints] --> B[collect] --> C[download] --> D[parse] --> E[package] --> F[publish] --> DV[Harvard Dataverse] C & D & E -.- S3[(S3)]

Each step runs independently and communicates through S3 rather than in-memory data passing. Any step can be re-run without repeating earlier steps.

Supported cases

backstage organizes data collection by case, where each case represents a jurisdiction or data domain with its own collection logic, source endpoints, and processing pipeline.

Case Description Status
EU European Union interinstitutional legislative procedures In development
Project Role
openstage Data models, adapters, and codebooks
openbasement Template-based RDF extraction from EU Cellar data
backstage (this project) Data collection, processing, and publishing