Metaminutes logo

Metaminutes

Archives
CivicBand
CivicObserver
February 8, 2026

How CivicBand Works, Feb 2026 Edition

Updated March 6, 2025

This is going to be pretty technical. I'm going to explain in reasonable depth how the CivicBand pipeline and hosting work, linking to resources where useful.

Key Architectural Decisions

  1. Each municipality is its own subdomain with state information, ie: alameda.ca or edmonton.ab.canada.

  2. Only the CivicBand pipeline writes to the database or to each site's static storage. There are not "live" writes from the website

  3. Processing happens locally, serving happens in the cloud

Clerk and CivicBand are Friends

The centerpiece of the CivicBand pipeline is the Clerk package. Clerk is CivicBand's Python CLI for managing processing. Clerk contains components for:

  • Core fetching and parsing utilities

  • Core database management

  • Core worker management (via RQ)

  • OCR and parsing of documents (via tesseract)

  • Plugin management (via pluggy)

  • Logging management and exception handling

Clerk does not contain components for:

  • Parsing and fetching the meeting meta info for municipalities

  • Uploading static assets to the cloud

  • Deploying a new site DB and config to production

Those components are plugins defined in the civic-band repo, which is private to CivicBand maintainers and volunteers only.

How a municipality is added to CivicBand

  1. On the production pipeline machine (not the production server), a Python environment using uv has a checkout of civic-band, which installs clerk.

  2. uv run clerk new starts the process for creating a new site, and I put in:

    1. subdomain (eg, alameda.ca)

    2. name

    3. state

    4. country

    5. kind (county, city, town, etc)

    6. start year (how far back to look)

    7. Lat / Lng (for the map on https://civic.band)

    8. Fetcher (legistar, granicus, civicclerk, agendacenter, etc)

    9. Extra info needed for the fetcher type, which is provided by the Fetcher subclasses in the civic-band repo.

  3. Clerk kicks off fetcher jobs, using the worker processes on the pipeline boxes

    1. These fetchers grab either API endpoints or HTML pages, and from those attempt to extract:

      • Meeting name

      • Meeting date

      • Minutes PDF Link

      • Agenda PDF Link

    2. That data is then used to build the structure of the data storage for the sites directory: sites/{subdomain}/pdfs/{meeting name}/{meeting date}.pdf, ie: sites/alameda.ca/pdfs/CityCouncil/2025-11-18.pdf

  4. Once the fetcher jobs are complete, the OCR jobs kick in. Each OCR job processes a full PDF, and does the following:

    1. Split the PDF into page images

    2. OCR each page image into a text file, which gets stored in a directory of the form: sites/{subdomain}/txt/{meeting name}/{meeting date}/{page_number}.txt, ie sites/alameda.ca/txt/CityCouncil/2025-11-18/1.txt

    3. Upload the page image to our static store (today: bunny.net. in-progress: Fastly, eta mid-April)

    4. Create a sentinel text file in sites/{subdomain}/processed/{meeting name}/{meeting date}.txt so we know we've processed that document and can delete it from disk.

  5. Once the OCR jobs are complete, the DB compilation job runs. This job:

    1. Builds a sqlite DB from the full municipality text files

    2. Turns on sqlite FTS and builds the FTS indexes

  6. Once the DB compilation job is done, the DB is uploaded to the production server's sites directory, along with an updated sites.db that is used to generate the homepage for https://civic.band

Processing Errata

  • CivicBand started by only indexing minutes, and added agendas later. So for agendas, the on-disk storage path is sites/{subdomain}/_agendas/[pdfs, processed, txt]/...

  • For similar reasons, the storage path in the cloud is:

    • {subdomain}/{meeting name}/{meeting date}/{page number}.png

    • ie: alameda.ca/CityCouncil/2025-11-18/1.png

    • Unless it's an agenda page, then put _agendas between {subdomain} and {meeting name}

  • Tech debt is hard

To Serve Muni

CivicBand today runs entirely on 4 docker images on one server. All of the code for this is in corkboard.

  • bunny.net serves DNS, which routes to Fastly’s CDN and caching layet, which routes to a single Hetzner box

  • That box runs caddy, which has N+1 host entries, where N is the number of states being served by CivicBand

    • Because I need multi-level wildcards and I'm not enough of a network engineer to figure this out, I use Let's Encrypt / certbot to generate and update a wildcard cert for each state, ie, *.ca.civic.band, *.bc.civic.band

    • Fastly now terminates and maintains our TLS certs, so I don’t have to think about it.

    • The 1 is the homepage, civic.band

  • All of these entries are run by the same Django app, which performs two operations:

    • Serves the homepage, https://civic.band

    • Uses a django plugin (using Simon Willison's djp), that checks incoming requests for subdomains and if there's a match in the sites.db (see above) returns a Datasette asgi response instead of a Django asgi response. Yes, this means I initialize a new Datasette "instance" for each matching request, but it turns out that's what Datasette does under the hood, I'm just doing it with Django instead of Datasette's asgi runner

    • The state-level Caddy entries go through anubis instances first, because the bot traffic against CivicBand is ridiculous.

  • I have a number of custom plugins for Datasette, which you can see in corkboard. This includes a plugin which takes the "image URL" column in the sqlite DB and replaces it with an img tag loading from our CDN

    • The CDN is, today, Fastly fronting bunny.net storage

    • I'm in the process of moving to Fastly, because CivicBand has been granted into the Fast Forward program.

Are you there, logs? It's me, Philip

Monitoring is critical to making sure CivicBand is performing the way I want it to. There's two major components to our observability.

  • Umami, which does anonymized product analytics so I can see where load is coming from and what searches / municipalities people are most interested in. This has helped me catch so many bots

  • Grafana / Loki, which takes all the system metrics and logs across the fleet and makes them legible and searchable and graphable.

I also have a Metabase set up, which lets me do pretty graphs like the Core Metrics and Sites Metrics.

That's all folks

The past couple months have been slower on the "site adding" front as I've invested a ton of time into infra improvements. My goal has been to get to a place where CivicBand could handle a 10x increase in the number of municipalities tracked, and I'm nearly there. My goal for this year is to get to 100 million pages under index, and there's more to come on how we're going to make that visible. Stay tuned, and thanks for joining us.

Philip

Don't miss what's next. Subscribe to Metaminutes:
Join the discussion:
  1. N
    Marc Joffe
    February 8, 2026, morning

    What, if anything, can users not on your core team to help add new cities and districts to Civic Band?

    Reply Report

Add a comment:

Share this email:
Share via email Share on Mastodon Share on Bluesky
Powered by Buttondown, the easiest way to start and grow your newsletter.