backstage

backstage is the data collection and processing pipeline for the openstage project. It collects legislative procedure data from official sources, processes it into structured formats, and publishes research-ready datasets.

What backstage does

backstage handles the operational side of data collection: querying official endpoints, downloading raw data, parsing it into typed openstage models, and publishing packaged datasets to Harvard Dataverse.

flowchart LR A[Source endpoints] --> B[collect] --> C[download] --> D[parse] --> E[package] --> F[publish] --> DV[Harvard Dataverse] C & D & E -.- S3[(S3)]

Each step runs independently and communicates through S3 rather than in-memory data passing. Any step can be re-run without repeating earlier steps.

Supported cases

backstage organizes data collection by case, where each case represents a jurisdiction or data domain with its own collection logic, source endpoints, and processing pipeline.

Case	Description	Status
EU	European Union interinstitutional legislative procedures	In development

Project	Role
openstage	Data models, adapters, and codebooks
openbasement	Template-based RDF extraction from EU Cellar data
backstage (this project)	Data collection, processing, and publishing

backstage

What backstage does

Supported cases

Related projects