How San Francisco Inspects Its Restaurants (And Why the Data Is Three Datasets in a Trench Coat)

San Francisco has roughly 11,000 food establishments. The city inspects them, documents the results, and publishes the data on its open data portal. So far, so normal.

Here’s the catch: San Francisco has changed how it formats and publishes that data three times since 2016. The result is that a single restaurant’s inspection history might span three different datasets, each with different fields, different scoring logic, and different ideas about what constitutes a “result.”

Three Eras of SF Inspection Data

2016–2019: Numeric scores. The city published a score per inspection on a 100-point scale. Deductions for violations. Higher is better. Clean and workable.

2020–2023: Pass/fail. The city switched to a binary system. Restaurants either passed or didn’t. No numeric score. The violation data was still there, but the summary output changed completely. If you were tracking a restaurant across this transition, its record went from numbers to words overnight.

2024–present: Hybrid. The city moved to a system that publishes both a result and violation-level data, but in a format that doesn’t match either of the previous two periods.

We had to build a pipeline that ingests all three datasets, maps them to a common schema, and produces a unified inspection history per restaurant. A restaurant that’s been open since 2015 might have numeric scores from its early inspections, pass/fail results from the middle years, and hybrid records from recent visits — all feeding into the same score.

The Scoring Approach

Because SF published numeric scores for part of its history and still publishes violation-level data, we use the raw-score model — same approach as Dallas. We take each inspection’s score, weight it for recency, and compute a weighted average.

For the pass/fail era, we reconstruct scores from the violation data where possible. For inspections that only have a pass/fail result with no granular violation detail, we use the result itself as the input — a clean pass gets a high implied score, a failure gets a low one.

It’s not perfect. A reconstructed score from violation data isn’t identical to a score that an inspector computed on the spot. But it’s better than ignoring three years of inspection history because the city changed its paperwork.

Why This Matters Beyond SF

San Francisco’s data problem isn’t unique — it’s just the most compressed example. Government agencies change their data formats. They update their code books. They migrate to new platforms. The data doesn’t go away, but it stops being consistent.

Any system that scores restaurants over time has to deal with this. A scoring model that only works on the current format will lose history every time a city updates its approach. A scoring model that stitches datasets together will have seams — places where the data quality changes — but it preserves continuity.

We chose continuity. A restaurant’s full history, warts and format changes included, tells a better story than just the last two inspections.

The Upside

For all its quirks, SF’s data is detailed at the violation level. The city documents specific violations with descriptions, not just codes, which makes the inspection history more readable than most cities. When you look up a San Francisco restaurant on Eat or Beat, the violation text tends to be more descriptive than what you’d see in a Chicago or Dallas record.

The data is messy underneath. The surface is clean. That’s the job.

Every score on Eat or Beat is computed from public health-department records. We don’t visit restaurants. We don’t accept payments from restaurants. We translate what’s already on file.