Deployment Strategies
Learn how to deploy FastAPI applications in various environments to ensure scalability and reliability.
Content
Using Gunicorn with FastAPI
Versions:
Watch & Learn
AI-discovered learning video
Sign in to watch the learning video for this topic.
Using Gunicorn with FastAPI — Because Uvicorn Alone Is Not a Process Manager (Sorry, Uvicorn)
"Uvicorn runs the app. Gunicorn makes sure the app doesn't cry when traffic arrives." — Not an official quote, but true.
Hook: Why are we even mixing these two?
You already learned how to run FastAPI with Uvicorn (dev server, super-fast async engine). You also explored asynchronous programming and saw how FastAPI shines under IO-bound loads. But what happens when your single Uvicorn process faces the real world: spikes, memory leaks, graceful restarts, and the general chaos of production? That's where Gunicorn steps in: it's a battle-tested process manager. Pair it with Uvicorn workers and you get asgi performance + production-grade process control.
Think of it like this: Uvicorn is a race car; Gunicorn is the pit crew, the strategist, and the spare tires.
Quick overview: Who does what?
- Uvicorn — an ASGI server and lightning-fast event loop implementation. Great at handling async I/O.
- Gunicorn — a pre-fork worker manager (process supervisor), gives you multiple workers, graceful reloads, signal handling, and other production niceties.
- The combo — use Gunicorn to spawn multiple Uvicorn workers (via
uvicorn.workers.UvicornWorker). You get the best of both worlds.
Why not just use uvicorn --workers? You can, but Gunicorn provides more mature process control, better signal handling, and more production features (preloading, graceful upgrades, logging conventions). Also many ops teams already know Gunicorn.
Basic command: run FastAPI with Gunicorn + Uvicorn workers
gunicorn myapp.main:app \
--workers 4 \
--worker-class uvicorn.workers.UvicornWorker \
--bind 0.0.0.0:8000 \
--log-level info
myapp.main:app— module:path to your FastAPIappobject.--worker-class uvicorn.workers.UvicornWorker— critical: tells Gunicorn to use an ASGI-capable Uvicorn worker.--workers— number of worker processes. More on sizing below.
Example Python-based Gunicorn config (clean and repeatable)
# gunicorn_conf.py
import multiprocessing
workers = multiprocessing.cpu_count() * 2 + 1
worker_class = "uvicorn.workers.UvicornWorker"
bind = "0.0.0.0:8000"
timeout = 30
keepalive = 2
loglevel = "info"
accesslog = "-" # write access log to stdout
errorlog = "-" # write error log to stdout
max_requests = 1000 # recycle workers periodically to mitigate memory leaks
max_requests_jitter = 50
preload_app = False # careful with asyncio and DB connections if True
Notes:
preload_app=Trueloads the app in the master before forking (saves memory via copy-on-write), but can break async resources or DB connections — use with caution.max_requestshelps avoid memory bloat by restarting workers after some requests.
Systemd unit for production (example)
[Unit]
Description=gunicorn daemon for myapp
After=network.target
[Service]
User=www-data
Group=www-data
WorkingDirectory=/srv/myapp
Environment="PATH=/srv/myapp/venv/bin"
ExecStart=/srv/myapp/venv/bin/gunicorn \
--config /srv/myapp/gunicorn_conf.py \
myapp.main:app
[Install]
WantedBy=multi-user.target
This gives you automatic restarts, logs integrated with journald, and easy deploy ergonomics.
Dockerfile snippet (production-ready-ish)
FROM python:3.11-slim
WORKDIR /app
COPY pyproject.toml requirements.txt /app/
RUN pip install --no-cache-dir -r requirements.txt
COPY . /app
ENV PYTHONUNBUFFERED=1
CMD ["gunicorn", "--config", "gunicorn_conf.py", "myapp.main:app"]
Make sure requirements.txt includes uvicorn[standard] and gunicorn.
Tuning & scaling: not all apps are created equal
- Worker count heuristics:
- For sync apps: common rule is (2 x CPU) + 1.
- For async FastAPI apps (IO-bound): fewer workers may be fine because each worker handles many concurrent connections — still, run at least 1 worker per CPU as a starting point and load-test.
- If your app is CPU-bound (image processing, heavy math), increase processes and consider moving heavy tasks to background workers (Celery, RQ).
timeoutprotects you from stuck workers.keepalivetunes connection persistence.max_requests+ jitter helps mitigate memory leaks.
Test with real load (wrk, locust, hey) — heuristics are just starting points.
Signals, graceful reloads, and deploy tricks
SIGHUP— reload config and gracefully restart workersSIGTERM/SIGINT— graceful shutdownSIGUSR2— perform binary upgrade (advanced)
Set up health checks (e.g., /health) so your load balancer knows when a worker is ready. Use --graceful-timeout in Gunicorn or configure systemd's TimeoutStopSec for smoother shutdowns.
Gotchas & caveats (read these, or learn them the hard way):
- Gunicorn is Unix-only. If you're on Windows, use alternative strategies (like Uvicorn directly or Docker Linux containers).
preload_app=Truecan break async libraries and DB pools — test it.- If you want HTTP/2 or advanced protocols, check compatibility (Uvicorn supports some via extras; Gunicorn + workers may vary).
- Logging: prefer writing logs to stdout/stderr in containers; let the platform collect them.
Checklist before you push to prod
- Use
uvicorn[standard]andgunicornin your prod requirements - Choose worker count and test under realistic load
- Configure
timeout,max_requests, andkeepalive - Add health/liveness endpoints
- Use systemd or container orchestrator for process supervision
- Avoid
preload_app=Trueunless you know your libraries are safe
TL;DR — When to use Gunicorn with FastAPI
- Use Gunicorn + Uvicorn workers when you want production-grade process management (multiple workers, signals, graceful reloads) while keeping FastAPI's async strengths.
- Use Uvicorn alone for simple deployments, small services, or when you prefer fewer moving parts (but consider a process manager like systemd or supervisord around it).
Final takeaway: think of Uvicorn as the engine and Gunicorn as the crew chief. For production, you usually want both: async speed without the chaos.
Now go forth, tune your workers, and may your 502s be few and your throughput high. If something breaks, run a load test, bump max_requests, and have coffee ready.
Comments (0)
Please sign in to leave a comment.
No comments yet. Be the first to comment!