We have been running Docker in production since 2018 and made every mistake possible. Fifteen lessons in order of how painfully we learned them.
One: your image is too big. We audited a client's 1.2GB Node.js image. Full Debian base, dev dependencies included, .git directory copied in. After optimization with multi-stage build and Alpine: 142MB. Build and deploy times dropped 60%.
Two: always use multi-stage builds. Build stage installs everything and compiles. Production stage copies only compiled output and production dependencies.
Three: pin your base image versions. node:18-alpine changes without warning. Use node:18.19.0-alpine3.19 and update deliberately.
Four: do not run as root. Add a USER directive to your Dockerfile. Three lines of configuration that eliminate an entire attack surface.
Five: health checks are not optional. Without them, Docker only knows the process is running, not if your application is working. We have seen containers where the Node process was alive but deadlocked, serving zero requests.
Six: handle SIGTERM. Without signal handling, Docker waits 10 seconds then SIGKILLs, dropping in-flight requests and leaking connections.
Seven: use tini as the init process for proper signal forwarding and zombie process reaping.
Eight: log to stdout, not files. Docker captures stdout natively and integrates with every logging platform.
Nine: secrets do not belong in images. Never COPY .env files into the image. Use runtime environment variables or a secrets manager.
Ten: local Docker lies. Docker on Mac runs a Linux VM. Networking, filesystem, and resource limits differ from production Linux.
Eleven: Docker Compose is not a production orchestrator. No rolling updates, no health-based restarts, no scaling. Use Swarm minimum or a managed platform.
Twelve: set memory and CPU limits. Without them, one container can starve everything else on the host.
Thirteen: optimize layer caching. Copy package.json and install dependencies before copying application code.
Fourteen: tag images with git commit SHA, not just "latest." Know exactly what code is deployed during incidents.
Fifteen: automate everything. SSH deploys are one typo from an outage. CI/CD with zero manual steps, every time. These lessons took four years and multiple incidents. Implement them from day one.
About the Author
Fordel Studios
AI-native app development for startups and growing teams. 14+ years of experience shipping production software.
We adopted Terraform for "reproducible infrastructure" and spent the first six months fighting state management. Here is what IaC actually looks like in practice for a small consultancy.
Your users should never see a deployment in progress. Here is how we achieve zero-downtime deployments for every project without Kubernetes or complex orchestration.
Most CI/CD pipelines we inherit are either too simple (no tests, no gates) or too complex (45-minute builds, flaky tests, nobody understands the YAML). Here is the pipeline we use on every project.
We love talking shop. If this article resonated, let's connect.
Start a ConversationTell us about your project. We'll give you honest feedback on scope, timeline, and whether we're the right fit.
Start a Conversation