4 min read

GCP + Cloud Run + FastAPI: the stack I keep reaching for

Why this combination. What it costs. What breaks. What I'd change.

Three different production systems in the last three years. All three ended up on a similar stack: GCP, Cloud Run for compute, FastAPI for the API layer. This is not religious conviction. It's the result of running things in production and noticing what causes problems and what doesn't.

Why GCP

The ML tooling is the main reason. Vertex AI, BigQuery ML, Cloud Vision API, the GPU availability via GCE: if your product involves ML in any significant way, GCP's tooling is meaningfully better integrated than the alternatives. The data tooling (BigQuery specifically) is excellent and pairs well with the ML workflows.

The second reason is the compliance posture for European enterprise clients. The Frankfurt region (europe-west3) gives us EU data residency, which matters for the clients I described in the compliance post from last year. GCP's compliance certifications and the DPA terms have been acceptable to more of our enterprise clients' legal teams than the alternatives.

The downside: GCP's console and documentation quality is inconsistent. Some services are well-documented and have good developer experience. Others are clearly products of internal tooling that made it to GA without much attention to the developer path. You learn which is which.

Why Cloud Run

Containers that scale to zero and scale up quickly, with no infrastructure to manage. The mental model is simple: build a container, deploy it, it runs when it gets traffic and costs nothing when it doesn't. For API services with spiky traffic, the cost profile is excellent.

The cold start latency is the tradeoff. Cloud Run containers cold-start in 1-3 seconds depending on the image size and initialization work. For latency-sensitive endpoints, this means minimum instances. Minimum instances cost money. The tuning question is: what's the minimum number of warm instances that keeps the tail latency acceptable?

For the Wasteer API layer, we run with two minimum instances per region. That's the number where the p99 latency is inside our SLA without paying for more warm capacity than we need.

Container size matters more than most teams optimize for. A smaller image cold-starts faster and pulls faster. We spend time keeping base images lean. The difference between a 500MB and a 2GB container image is 1-2 seconds of cold start time. Worth the effort.

Why FastAPI

Python async, automatic OpenAPI documentation, Pydantic validation, performance that's close enough to Go for our use cases. The ML and data science ecosystem is Python. Using FastAPI means the API layer and the ML components speak the same language and share libraries without FFI overhead.

The async support is the practical differentiator. Our API calls include I/O: database queries, external API calls, model inference requests. With async FastAPI and proper async database drivers, the API layer handles concurrent requests without blocking. The throughput is high enough for our current scale.

The automatic documentation is underrated as a team productivity feature. Every endpoint is documented the moment it's written. Client integrations happen faster. Internal API consumers can find what they need without asking.

What breaks

Background tasks in Cloud Run. Cloud Run terminates containers after requests complete. Background tasks that were started during a request may not finish. We've hit this with tasks that need to run after the response is sent. The fix is task queues (Cloud Tasks) rather than in-process background tasks. This is the right architecture anyway. It was an unpleasant discovery.

Secrets management across environments. Managing environment-specific secrets (dev, staging, prod) requires a structured approach from the start. We used environment variables early and migrated to Secret Manager later. The migration was tedious. Build the Secret Manager integration first.

What I'd change

Nothing structural. The stack is right for the problem set. The things I'd do differently are operational: better container size discipline from the start, Secret Manager from day one, Cloud Tasks instead of in-process background work.

With gusto, Fatih.