Deployment Strategies
Learn how to deploy FastAPI applications in various environments to ensure scalability and reliability.
Content
Deployment on Uvicorn
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
Deployment on Uvicorn — FastAPI Goes Live (and Stays Sane)
"You wrote async code that hums like a caffeinated orchestra. Now let’s make sure the audience doesn’t hear the conductor sneeze." — Your slightly dramatic TA
You already know how to write async endpoints, await the right things, and avoid blocking the event loop (shout-out to our previous section on Advanced Async Patterns). Deployment isn’t just about starting a process — it’s about choosing the right runtime configuration, process model, and operational guardrails so your app stays fast and doesn’t crash during peak coffee orders.
What this page gives you
- Practical, production-ready ways to run FastAPI with Uvicorn
- When to use simple uvicorn, when to pair with Gunicorn, and how to containerize or run under systemd
- Performance tuning tips (uvloop, workers, timeouts), logging, graceful shutdown, and gotchas related to async lifecycles
Quick refresher (assumed knowledge)
You’ve learned async patterns and how to use async libraries. Deployment choices must respect those patterns: don't let blocking code sabotage your event loop; offload CPU-bound work; handle startup/shutdown events reliably across processes.
Running Uvicorn — Basics (the commands you’ll use)
Start local dev server (not production-ready):
uvicorn myapp.main:app --reload --host 0.0.0.0 --port 8000
Production starter (single process):
uvicorn myapp.main:app --host 0.0.0.0 --port 8000 --log-level info --proxy-headers
Key flags:
--reload: developer-only. Never use in production.--proxy-headers: if behind NGINX/load balancer so client IPs and headers are preserved.--workers N: spawns N processes (useful, but process management is better done by Gunicorn or a supervisor).
Uvicorn vs Gunicorn+UvicornWorker: When to pick what
| Approach | Pros | Cons | Use when... |
|---|---|---|---|
Uvicorn alone (uvicorn --workers) |
Simple, fast, low overhead | Lacks mature process management features (restarts, graceful reloading) | Small services, single server, Kubernetes pods where a controller handles restarts |
| Gunicorn + UvicornWorker | Robust process management, better ecosystem | Adds a layer, slightly more config | Traditional deployments, systemd-managed servers, when you want pre-fork model control |
| Uvicorn in Docker/Kubernetes | Container-friendly, horizontally scalable | Requires orchestration knowledge | Cloud-native deployments, autoscaling |
Example Gunicorn command:
gunicorn -k uvicorn.workers.UvicornWorker myapp.main:app -w 4 --bind 0.0.0.0:8000
Tuning performance
- Use uvloop: pip install uvloop — Uvicorn will use it and it improves throughput and latency on Unix.
- Workers vs threads: For async I/O-bound work, more processes = more concurrency to use multiple CPU cores. For CPU-bound work, offload to ProcessPoolExecutor.
- Keep blocking code out of the event loop. If you must: wrap synchronous calls in run_in_executor or a background task.
- Set sensible timeouts (proxy, gunicorn, load balancer) to avoid hanging connections.
Example: enabling uvloop explicitly in Python run:
import uvicorn
import asyncio
import uvloop
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
uvicorn.run("myapp.main:app", host="0.0.0.0", port=8000)
Graceful startup and shutdown — the real drama
Use FastAPI's startup/shutdown events for DB connections, caches, or long-lived clients. If you have multiple processes, remember: each process runs startup events. Beware of singleton resource initializations that should run once — coordinate externally (migrations job, init container).
Important: signal handling is done by the process manager. Gunicorn handles it for workers; uvicorn in --workers mode will manage children but less feature-rich than Gunicorn.
Logging & observability
- Enable access logs:
--access-logor configure via Gunicorn logging. - Use structured logs (JSON) for easy downstream parsing.
- Expose /metrics for Prometheus and hook up tracing (OpenTelemetry) early.
Quick example enabling access logs:
uvicorn myapp.main:app --access-log --log-level info
Common operational setups
- systemd (single server)
Example unit (systemd):
[Unit]
Description=My FastAPI app
After=network.target
[Service]
User=www-data
Group=www-data
WorkingDirectory=/srv/myapp
ExecStart=/usr/local/bin/gunicorn -k uvicorn.workers.UvicornWorker myapp.main:app -w 4 -b 127.0.0.1:8000
Restart=always
[Install]
WantedBy=multi-user.target
- Dockerfile (simple)
FROM python:3.11-slim
WORKDIR /app
COPY pyproject.toml poetry.lock /app/
RUN pip install --no-cache-dir -r requirements.txt
COPY . /app
CMD ["gunicorn", "-k", "uvicorn.workers.UvicornWorker", "myapp.main:app", "-w", "4", "-b", "0.0.0.0:8000"]
- Kubernetes
- Deploy as Deployment with Liveness/Readiness probes hitting small endpoints
- Use HorizontalPodAutoscaler based on CPU or custom metrics
- Let an Ingress/Service handle TLS termination
Reverse proxy & TLS
Terminate TLS at NGINX/Cloud LB. Use --proxy-headers in Uvicorn to rely on X-Forwarded-* headers. Keep keepalive tuned on the proxy to avoid piling up connections.
NGINX snippet (simple):
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
proxy_pass http://127.0.0.1:8000;
Gotchas & checklist (read before you press Deploy)
- Never run with
--reloadin prod. - If you use background tasks or startup hooks that create connections, ensure they behave when multiplied by worker count.
- Offload CPU-bound tasks — don’t block the event loop.
- Configure health checks and graceful shutdowns so load balancers stop sending traffic to exiting pods/processes.
- Watch file descriptors / ulimit if serving many concurrent connections.
Final takeaways
- Uvicorn is fast and lightweight — excellent for FastAPI. For production, pair with a process manager (Gunicorn, systemd, Kubernetes) unless your environment already supplies orchestration.
- Respect async: avoid blocking, use uvloop, and plan for multi-process semantics.
- Monitor, log, and automate: graceful shutdowns, metrics, and health checks are not optional.
Be pragmatic: start simple (one managed process behind a proxy), measure, then scale horizontally with containers or Gunicorn workers. And if something weird happens, check for blocking calls first — it’s usually the event loop having a tantrum.
Next up (suggested): add observability — metrics, tracing, and structured logs so when your async app goes wild, you’ll actually know why.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!