We adopted Terraform in 2022 for managing twelve clients' cloud infrastructure across AWS and GCP, each previously configured through web consoles. One accidental security group deletion caused two hours of downtime because we had no record of the original configuration. Terraform solved that problem and introduced five new ones.
State management is the first. Terraform tracks infrastructure state in a file. If that file gets corrupted, lost, or out of sync with reality, Terraform can make destructive changes. Our first state file corruption happened three months in when concurrent CI pipelines corrupted the lock. Recovery took four hours of manual state manipulation. Our rules now: state in versioned S3, mandatory locking, only CI pipelines run terraform apply.
Blast radius is the second. A developer changed the name of an RDS parameter, which Terraform interpreted as "destroy the old instance, create a new one." The plan showed "1 to destroy, 1 to create." We caught it in review, but the near-miss was sobering. Now every plan is reviewed by a second person, destroy operations are highlighted, and critical resources use prevent_destroy.
Drift is the third. When someone changes something through the AWS console, Terraform does not know. The next apply might revert the change or fail. We run weekly drift detection plans.
Learning curve is the fourth. Terraform has its own language, module system, state concepts, and debugging workflow. About 15% of our infrastructure time goes to Terraform management itself.
Scope is the fifth. We tried to Terraform everything and ended up with a massive codebase that was slow to plan and hard to debug. We pulled back: Terraform manages compute, databases, networking, storage, IAM. Everything else uses platform-native tools.
Our honest assessment: worth it above three production environments or twenty cloud resources. Below that, overhead exceeds benefit. For teams in the middle ground, we now recommend Pulumi over Terraform. Infrastructure in TypeScript gets the same type checking and IDE support as application code. We have not started a new Terraform project in over a year. Start on managed platforms with no IaC. Introduce it only when complexity justifies the overhead.
About the Author
Fordel Studios
AI-native app development for startups and growing teams. 14+ years of experience shipping production software.
Your users should never see a deployment in progress. Here is how we achieve zero-downtime deployments for every project without Kubernetes or complex orchestration.
Most CI/CD pipelines we inherit are either too simple (no tests, no gates) or too complex (45-minute builds, flaky tests, nobody understands the YAML). Here is the pipeline we use on every project.
Datadog costs more than your entire infrastructure. Here is the monitoring stack we use for startup clients that costs under fifty dollars per month and catches 95 percent of issues before users report them.
We love talking shop. If this article resonated, let's connect.
Start a ConversationTell us about your project. We'll give you honest feedback on scope, timeline, and whether we're the right fit.
Start a Conversation