Data sources
Every record in this directory traces back to one or more open public datasets. We use only datasets with permissive open licenses, no contractually-restricted commercial APIs, and no scraped data from sites that prohibit it.
Primary sources
Overture Maps Places
- License: CDLA-Permissive-2.0 / CC-BY 4.0 / Apache-2.0 (mixed by upstream contributor).
- Refresh: monthly. We pull the latest available release on each pipeline run.
- Provides: name, address, location coordinates, primary category, contact details.
- Aggregates: Foursquare, Meta, Microsoft, OSM contributors and more.
OpenStreetMap
- License: Open Data Commons Open Database License (ODbL). Attribution required.
- Refresh: weekly via the public Overpass API.
- Provides: opening hours, cuisine, contact info, independent verification of presence.
- © OpenStreetMap contributors.
California Department of Alcoholic Beverage Control
- License: California public records.
- Refresh: daily (ABC publishes an updated CSV at 7 AM PT each business day).
- Provides: alcohol license type, number, status, dates for every licensed business in California.
Santa Clara County Department of Environmental Health
- License: Santa Clara County public records.
- Refresh: daily via Socrata SODA API.
- Provides: food facility inspection results, scores, and violation details.
Wikidata
- License: Creative Commons Zero (CC0) — public domain.
- Refresh: weekly via Wikidata Query Service (SPARQL).
- Provides: founders, founding dates, parent organizations, and other structured facts about notable San Jose businesses.
Sources we explicitly do NOT use
- Google Places API — its terms of service forbid persistent storage and republication of most fields.
- Yelp Fusion API — same display restrictions plus mandatory link-back.
- Scraped Yelp / Google data — terms-of-service and trespass-to-chattels concerns.
- California Secretary of State bulk extracts — paid product (~$2,000+ per request).
Known gaps
- City of San Jose business tax certificates — not currently published on the city's open data portal. We compensate by triangulating "actively operating" status from multi-source presence and active state licenses.
- California Department of Consumer Affairs licensee data — DCA's bulk download is behind a custom portal we have not yet integrated. Adding this would bring contractor, cosmetology, and other professional licensing data into the directory.