GovDataHub — Collecting Public Data

Government data is public. Finding it in a form you can actually use is another story.

I started GovDataHub because I kept hitting the same friction: interesting datasets scattered across federal agencies, inconsistent formats, and no single place to bookmark what I had already explored.

This is a dev log, not a launch announcement. The project is paused while I focus on other work.

The problem I am solving

When I research a topic — housing, transit, agriculture, whatever — I end up with:

A dozen browser tabs across .gov domains
CSV downloads with incompatible schemas
No memory of which dataset I already evaluated and rejected

GovDataHub is my attempt to build a personal index: sources I care about, metadata that helps me compare them, and a path toward exploration without starting from zero each time.

Design constraints

Personal tool rules apply:

Start small — One domain I actually research beats a generic crawler for everything
Provenance matters — Every dataset needs source URL, retrieval date, and license notes
Boring storage first — SQLite or Postgres before anything clever

I am not trying to replace Data.gov. I want a workspace that respects how individual developers and researchers actually work.

Technical direction

Stack is still settling. Likely Python for ingestion scripts, a simple API layer, and a frontend when the data model stops changing every week.

Early milestones:

Catalog schema (source, title, format, update cadence)
One end-to-end ingest pipeline for a single agency feed
Search and filter over what has been collected

GovDataHub sits alongside other side projects — SeedStarter for gardening, Browser Listener for local Facebook session capture. Each solves a narrow problem I have repeatedly.

I will post updates as the catalog and ingestion pieces land. If public data organization is your kind of rabbit hole, the repo is on GitHub.

The problem I am solving

Design constraints

Technical direction

Related work on this site