Running the Pipeline
Prerequisites
backstage requires Python 3.11+ and uses uv for environment management.
# Install dependencies
uv sync
# For parsing (requires openbasement)
uv sync --extra parsing
# Environment (add your Hetzner S3 credentials)
cp .env.example .env
source .env
Full pipeline
Run all steps for a case:
python -m flows.run --case eu --sample # sample mode (~10 procedures)
python -m flows.run --case eu # full production run
python -m flows.run --case eu --dry-run # skip Dataverse publishing
Incremental update
Collect new procedures and download only missing RDFs, then re-parse and package:
python -m flows.run --case eu --download-mode incremental
Specific steps
Run only selected steps:
python -m flows.run --case eu --steps collect download
python -m flows.run --case eu --steps parse
python -m flows.run --case eu --steps publish --dry-run # Dataverse publishing only
Standalone step execution
Each step can also run independently:
python -m flows.eu.collect --sample
python -m flows.eu.download --sample
python -m flows.eu.parse --sample
python -m flows.eu.publish --dry-run
All cases
Run the pipeline for all configured cases:
python -m flows.run --case all --sample --dry-run
CLI flags
| Flag | Description |
|---|---|
--case |
Case to run (eu, all). Default: eu |
--steps |
Specific steps to run. Default: all steps for the case |
--sample |
Limit to ~10 procedures for testing |
--sample-limit N |
Override sample size (default: 10) |
--dry-run |
Skip destructive operations (Dataverse publishing) |
--download-mode |
full (re-download all) or incremental (only new). Default: full |