How CivicBand Works, Feb 2026 Edition
Updated March 6, 2025
This is going to be pretty technical. I'm going to explain in reasonable depth how the CivicBand pipeline and hosting work, linking to resources where useful.
Key Architectural Decisions
Each municipality is its own subdomain with state information, ie: alameda.ca or edmonton.ab.canada.
Only the CivicBand pipeline writes to the database or to each site's static storage. There are not "live" writes from the website
Processing happens locally, serving happens in the cloud
Clerk and CivicBand are Friends
The centerpiece of the CivicBand pipeline is the Clerk package. Clerk is CivicBand's Python CLI for managing processing. Clerk contains components for:
Core fetching and parsing utilities
Core database management
Core worker management (via RQ)
OCR and parsing of documents (via tesseract)
Plugin management (via pluggy)
Logging management and exception handling
Clerk does not contain components for:
Parsing and fetching the meeting meta info for municipalities
Uploading static assets to the cloud
Deploying a new site DB and config to production
Those components are plugins defined in the civic-band repo, which is private to CivicBand maintainers and volunteers only.
How a municipality is added to CivicBand
On the production pipeline machine (not the production server), a Python environment using
uvhas a checkout ofcivic-band, which installsclerk.uv run clerk newstarts the process for creating a new site, and I put in:subdomain (eg, alameda.ca)
name
state
country
kind (county, city, town, etc)
start year (how far back to look)
Lat / Lng (for the map on https://civic.band)
Fetcher (legistar, granicus, civicclerk, agendacenter, etc)
Extra info needed for the fetcher type, which is provided by the Fetcher subclasses in the civic-band repo.
Clerk kicks off fetcher jobs, using the worker processes on the pipeline boxes
These fetchers grab either API endpoints or HTML pages, and from those attempt to extract:
Meeting name
Meeting date
Minutes PDF Link
Agenda PDF Link
That data is then used to build the structure of the data storage for the
sitesdirectory:sites/{subdomain}/pdfs/{meeting name}/{meeting date}.pdf, ie:sites/alameda.ca/pdfs/CityCouncil/2025-11-18.pdf
Once the fetcher jobs are complete, the OCR jobs kick in. Each OCR job processes a full PDF, and does the following:
Split the PDF into page images
OCR each page image into a text file, which gets stored in a directory of the form:
sites/{subdomain}/txt/{meeting name}/{meeting date}/{page_number}.txt, iesites/alameda.ca/txt/CityCouncil/2025-11-18/1.txtUpload the page image to our static store (today: bunny.net. in-progress: Fastly, eta mid-April)
Create a sentinel text file in
sites/{subdomain}/processed/{meeting name}/{meeting date}.txtso we know we've processed that document and can delete it from disk.
Once the OCR jobs are complete, the DB compilation job runs. This job:
Builds a sqlite DB from the full municipality text files
Turns on sqlite FTS and builds the FTS indexes
Once the DB compilation job is done, the DB is uploaded to the production server's
sitesdirectory, along with an updatedsites.dbthat is used to generate the homepage for https://civic.band
Processing Errata
CivicBand started by only indexing minutes, and added agendas later. So for agendas, the on-disk storage path is
sites/{subdomain}/_agendas/[pdfs, processed, txt]/...For similar reasons, the storage path in the cloud is:
{subdomain}/{meeting name}/{meeting date}/{page number}.pngie:
alameda.ca/CityCouncil/2025-11-18/1.pngUnless it's an agenda page, then put
_agendasbetween{subdomain}and{meeting name}
Tech debt is hard
To Serve Muni
CivicBand today runs entirely on 4 docker images on one server. All of the code for this is in corkboard.
bunny.net serves DNS, which routes to Fastly’s CDN and caching layet, which routes to a single Hetzner box
That box runs
caddy, which has N+1 host entries, where N is the number of states being served by CivicBandBecause I need multi-level wildcards and I'm not enough of a network engineer to figure this out, I use Let's Encrypt / certbot to generate and update a wildcard cert for each state, ie,*.ca.civic.band,*.bc.civic.bandFastly now terminates and maintains our TLS certs, so I don’t have to think about it.
The 1 is the homepage,
civic.band
All of these entries are run by the same Django app, which performs two operations:
Serves the homepage, https://civic.band
Uses a django plugin (using Simon Willison's
djp), that checks incoming requests for subdomains and if there's a match in thesites.db(see above) returns a Datasette asgi response instead of a Django asgi response. Yes, this means I initialize a new Datasette "instance" for each matching request, but it turns out that's what Datasette does under the hood, I'm just doing it with Django instead of Datasette's asgi runnerThe state-level Caddy entries go through anubis instances first, because the bot traffic against CivicBand is ridiculous.
I have a number of custom plugins for Datasette, which you can see in
corkboard. This includes a plugin which takes the "image URL" column in the sqlite DB and replaces it with animgtag loading from our CDNThe CDN is, today, Fastly fronting bunny.net storage
I'm in the process of moving to Fastly, because CivicBand has been granted into the Fast Forward program.
Are you there, logs? It's me, Philip
Monitoring is critical to making sure CivicBand is performing the way I want it to. There's two major components to our observability.
Umami, which does anonymized product analytics so I can see where load is coming from and what searches / municipalities people are most interested in. This has helped me catch so many bots
Grafana / Loki, which takes all the system metrics and logs across the fleet and makes them legible and searchable and graphable.
I also have a Metabase set up, which lets me do pretty graphs like the Core Metrics and Sites Metrics.
That's all folks
The past couple months have been slower on the "site adding" front as I've invested a ton of time into infra improvements. My goal has been to get to a place where CivicBand could handle a 10x increase in the number of municipalities tracked, and I'm nearly there. My goal for this year is to get to 100 million pages under index, and there's more to come on how we're going to make that visible. Stay tuned, and thanks for joining us.
Philip
-
What, if anything, can users not on your core team to help add new cities and districts to Civic Band?
Add a comment: