Documentation - imibare

QUICKSTART

From zero to DataFrame in 60 seconds

1. Install

$ pip install imibare

For Iceberg time travel: pip install imibare[iceberg]

2. Load data

import imibare as imi

# List all available datasets
catalog = imi.catalog()
print(f"{len(catalog)} datasets available")

# Load Rwanda CPI data
df = imi.load("rw.nisr.cpi.monthly")
print(df.tail())

# Load with Polars engine
df_polars = imi.load("rw.nisr.cpi.monthly", engine="polars")

# Versioned load (requires imibare[iceberg])
df_2025 = imi.load("rw.nisr.cpi.monthly", version="2025-06-01")

import duckdb

duckdb.sql("""
    ATTACH 'https://catalog.cloudflarestorage.com/imibare-data'
    AS imi (TYPE ICEBERG, READ_ONLY)
""")

# List all tables
duckdb.sql("SHOW TABLES").show()

# Query Rwanda CPI
duckdb.sql("""
    SELECT date, category, value
    FROM imi.rw_nisr_cpi_monthly
    WHERE date >= '2024-01-01'
    ORDER BY date DESC
    LIMIT 12
""").show()

DATASET ID CONVENTION

All dataset IDs use four dot-separated segments: {country}.{institution}.{topic}.{frequency}

Examples: rw.nisr.cpi.monthly · rw.bnr.fx.daily · rw.rra.revenue.monthly

API REFERENCE

Base URL: https://api.imibare.org

GET /v1/datasets ?institution=nisr&topic=prices&frequency=monthly&pipeline=automated

List all datasets. Supports filtering by institution, topic, frequency, pipeline.

curl "https://api.imibare.org/v1/datasets?institution=nisr"

GET /v1/datasets/:id

Get metadata for a single dataset by ID.

curl "https://api.imibare.org/v1/datasets/rw.nisr.cpi.monthly"

GET /v1/data/:id/download

Download the latest Parquet file for a dataset, streamed directly from R2.

curl -L "https://api.imibare.org/v1/data/rw.nisr.cpi.monthly/download" -o cpi.parquet

GET /v1/search ?q=consumer+price

Full-text search across dataset names, institutions, topics, and tags.

curl "https://api.imibare.org/v1/search?q=gdp"

CONTRIBUTING

Add a new dataset in 4 steps

Create a Marimo notebook

In notebooks/rw/, create a notebook that fetches, explores, cleans, and validates the data. Follow the four-section structure in NOTEBOOKS.md. The notebook must pass the graduation checklist before moving to step 2.

Promote the cleaning function

Copy the extraction function from the notebook into the appropriate file in pipeline/lib/ (excel_extract.py, json_extract.py, or pdf_extract.py). Write unit tests.

Add an entry to datasets.yaml

Follow the schema in pipeline/catalog/datasets.yaml. Dataset IDs must have four segments and always start with the country code: rw.{institution}.{topic}.{frequency}. Run the YAML validation tests.

Write the Dagster asset

Create pipeline/assets/rw/{institution}_{topic}.py following the thin-wrapper pattern. The asset fetches, extracts, validates, and uploads -- no cleaning logic in the asset itself.

↗ Full CONTRIBUTING.md on GitHub

Get started,fast.