Terraform State Lock Error: How to Safely Unlock and Prevent State Corruption
You run terraform apply and instead of infrastructure changes, you get a wall of red text: "Error acquiring the state lock." The Lock ID is a UUID you have never seen before, and your CI/CD pipeline is dead in the water. Here is exactly how to fix it without corrupting your state file.
TL;DR
Run terraform force-unlock <LOCK_ID>. But first, check if another apply is actually running. Use aws dynamodb scan --table-name terraform-locks to inspect the lock record. If the lock is orphaned (a crashed CI runner, a killed SSH session, or a Ctrl+C during apply), force-unlock is safe. If someone on your team is actively applying changes right now, wait for them to finish. Force-unlocking during a live apply will corrupt your state.
Why Terraform Locks State
Terraform state is a single JSON file that maps your HCL configuration to real infrastructure resources. Every plan, apply, and destroy operation reads this file, computes a diff, and writes the updated state back. If two engineers (or two CI pipelines) run terraform apply at the same time against the same state file, the second write overwrites the first. Resources get orphaned. IDs get mismatched. Your state file no longer reflects reality.
This is not a theoretical risk. After managing Terraform for 20+ AWS accounts, I can tell you that state corruption from concurrent writes is one of the most common and most painful infrastructure incidents a team will face. The fix is state locking: before any write operation, Terraform acquires an exclusive lock. If another process already holds the lock, Terraform refuses to proceed.
Every major backend implements locking differently:
- AWS S3 + DynamoDB: Lock record stored in a DynamoDB table. The most common setup for AWS teams.
- Google Cloud Storage (GCS): Native object locking via GCS object metadata.
- Azure Blob Storage: Blob lease mechanism. Azure acquires a lease on the state blob.
- Terraform Cloud / Enterprise: Built-in locking with a web UI to view and manage locks.
- Consul: Key/value store with session-based locking.
- PostgreSQL: Advisory locks via the
pg_advisory_lockfunction.
Without locking, Terraform becomes a footgun at scale. Two developers running apply at the same time can produce a state file where half the resources reference infrastructure that was already modified by the other operation. Recovering from this requires hours of manual terraform import commands.
Step 1: Read the Error Message
When Terraform fails to acquire the lock, it prints a detailed error message. Do not skip it. Every field matters:
Error: Error acquiring the state lock
Error message: ConditionalCheckFailedException: The conditional request failed
Lock Info:
ID: a1b2c3d4-e5f6-7890-abcd-ef1234567890
Path: s3://my-terraform-state/prod/terraform.tfstate
Operation: OperationTypeApply
Who: runner@ip-10-0-1-47
Version: 1.7.5
Created: 2026-04-06 09:23:14.567890 +0000 UTC
Info:
Here is what each field tells you:
- ID: The unique lock identifier. You will need this UUID for
force-unlock. - Path: Which state file is locked. Confirms you are looking at the right environment.
- Operation: What operation holds the lock (
OperationTypeApply,OperationTypePlan, orOperationTypeRefresh). - Who: The username and hostname of the process that acquired the lock. This is your biggest clue. If it says
runner@ip-10-0-1-47, that is a CI runner. If it saysjsmith@jsmith-laptop, go talk to jsmith. - Created: When the lock was acquired. If this timestamp is hours old and no apply should be running, the lock is almost certainly orphaned.
Step 2: Is the Lock Legitimate?
Before you force-unlock anything, determine whether the lock is active or orphaned. An active lock means someone (or some pipeline) is genuinely running a Terraform operation right now. An orphaned lock means the process that acquired it crashed, was killed, or lost connectivity before it could release the lock.
Check your CI/CD system first. Look at GitHub Actions, GitLab CI, or whatever pipeline tool you use. Is there a running Terraform job for this workspace? If yes, wait for it to finish.
If you cannot tell from the CI dashboard, query DynamoDB directly:
# Check the lock record in DynamoDB
aws dynamodb get-item \
--table-name terraform-locks \
--key '{"LockID": {"S": "my-terraform-state/prod/terraform.tfstate-md5"}}' \
--region us-east-1
This returns the full lock record, including the Info field with a JSON payload containing the lock holder details. If the Created timestamp is recent (within the last few minutes) and matches an active pipeline run, do not force-unlock. If the timestamp is stale (hours or days old), the lock is orphaned.
You can also scan for all locks in the table:
# List all current locks
aws dynamodb scan \
--table-name terraform-locks \
--region us-east-1 \
--output table
Step 3: Force Unlock (Orphaned Locks)
Once you have confirmed the lock is orphaned, force-unlock is safe. Copy the Lock ID from the error message and run:
terraform force-unlock a1b2c3d4-e5f6-7890-abcd-ef1234567890
Terraform will ask for confirmation:
Do you really want to force-unlock?
Terraform will remove the lock on the remote state.
This will allow local Terraform commands to modify this state, even though it
may still be in use. Only 'yes' is accepted to confirm.
Enter a value: yes
Terraform state has been successfully unlocked!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure.
Common scenarios where force-unlock is safe:
- CI runner crashed or was terminated mid-apply: The runner is gone, but the DynamoDB record persists.
- SSH session died: You were running apply over SSH, the connection dropped, and the remote process was killed.
- Ctrl+C during apply: Terraform tries to release the lock on interrupt, but if the process is killed hard (
kill -9), the lock remains. - Network timeout: Terraform lost connectivity to the backend during a long apply and could not release the lock.
When force-unlock is not safe: if someone is actively running terraform apply right now. Force-unlocking while a write is in progress means two processes can modify state simultaneously. This is the exact scenario locking was designed to prevent.
Step 4: Recover from Corrupted State
If force-unlock happened too late and your state is already corrupted (resources exist in AWS but not in state, or state references resources that no longer exist), you need to perform state surgery. Start by backing up the current state:
# Always back up before state surgery
terraform state pull > backup-$(date +%Y%m%d_%H%M%S).tfstate
Then assess the damage. Run a plan and look at the output carefully:
terraform plan -out=recovery.tfplan
If Terraform wants to recreate resources that already exist, those resources are missing from state. Import them:
# Import an existing EC2 instance back into state
terraform import aws_instance.web i-0abc123def456789
# Import an existing S3 bucket
terraform import aws_s3_bucket.data my-data-bucket
# Import an existing RDS instance
terraform import aws_db_instance.main my-rds-instance
If Terraform wants to destroy resources you no longer manage, remove them from state:
# Remove a resource from state without destroying it
terraform state rm aws_instance.legacy_server
For drift detection (state says one thing, AWS says another), use refresh:
# Update state to match real infrastructure
terraform refresh
Note: terraform refresh is deprecated in newer versions. Use terraform apply -refresh-only instead, which shows you the changes before applying them.
Set a DynamoDB TTL attribute on your lock table. If a CI runner crashes and never releases the lock, the TTL auto-expires it after a configured duration (I recommend 1 hour) instead of requiring manual intervention. Most teams do not know DynamoDB supports TTL on lock records, and this single configuration change eliminates the most common reason for manual force-unlock.
To enable TTL, add a numeric ExpirationTime attribute to your lock items and configure the TTL on the table:
# Enable TTL on the lock table
aws dynamodb update-time-to-live \
--table-name terraform-locks \
--time-to-live-specification "Enabled=true, AttributeName=ExpirationTime"
You will also need a Lambda function or wrapper script that sets the ExpirationTime attribute when creating lock records. A simple approach: wrap your terraform apply in a script that writes the TTL attribute to DynamoDB after Terraform acquires the lock.
Is Your Infrastructure Leaking Secrets?
Terraform state files often contain database passwords, API keys, and connection strings in plaintext. Use the SecureBin Exposure Checker to verify your state backend is not publicly accessible.
Check Your Exposure FreeStep 5: Prevent Lock Conflicts in CI/CD
The most common source of lock errors is not two humans running apply at the same time. It is two CI/CD pipeline runs triggering simultaneously. A merge to main, a manual re-run of a failed job, a scheduled pipeline, and a PR merge all happening within seconds of each other. The fix is pipeline-level serialization.
GitHub Actions: Concurrency Groups
GitHub Actions has a built-in concurrency mechanism that ensures only one workflow run per group executes at a time:
name: Terraform Apply
on:
push:
branches: [main]
paths: ['terraform/**']
concurrency:
group: terraform-prod
cancel-in-progress: false # Do NOT cancel running applies
jobs:
apply:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.7.5
- name: Terraform Init
run: terraform init
working-directory: terraform/prod
- name: Terraform Apply
run: terraform apply -auto-approve
working-directory: terraform/prod
The key setting is cancel-in-progress: false. You never want to cancel a running Terraform apply. Instead, the queued run waits for the current one to finish. If you set this to true, GitHub will kill the running apply, leaving an orphaned lock (exactly the problem we are trying to solve).
GitLab CI: resource_group
terraform-apply:
stage: deploy
resource_group: terraform-prod
script:
- terraform init
- terraform apply -auto-approve
only:
- main
Atlantis
Atlantis handles locking natively at the project level. Only one PR can hold the lock for a given project directory at a time. Other PRs that modify the same Terraform project will see a "locked by PR #123" message until the first PR is merged or the lock is released. This is one of the strongest arguments for using Atlantis in teams with many Terraform contributors.
Step 6: Backend Configuration Best Practices
A properly configured S3 backend with DynamoDB locking is the foundation of reliable Terraform state management on AWS. Here is a complete, production-ready backend block:
terraform {
backend "s3" {
bucket = "mycompany-terraform-state"
key = "prod/us-east-1/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
acl = "private"
}
}
And the DynamoDB table must have a partition key named LockID of type String:
resource "aws_dynamodb_table" "terraform_locks" {
name = "terraform-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
tags = {
Name = "terraform-state-locks"
Environment = "shared"
ManagedBy = "terraform"
}
}
Critical settings most teams forget:
- Enable S3 versioning: This gives you a history of every state file version. If state gets corrupted, you can roll back to a previous version from the S3 console. Without versioning, a corrupted state write is permanent.
- Enable server-side encryption: State files contain sensitive data (passwords, connection strings, resource IDs). The
encrypt = trueflag enables AES-256 encryption at rest. - Block public access on the S3 bucket: Your state file should never be publicly accessible. Apply the S3 Block Public Access settings at the bucket level. For more on AWS security, see our AWS security checklist for production.
- Use PAY_PER_REQUEST billing: Lock operations are infrequent. Provisioned capacity is wasteful for a lock table.
resource "aws_s3_bucket_versioning" "state" {
bucket = aws_s3_bucket.terraform_state.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_public_access_block" "state" {
bucket = aws_s3_bucket.terraform_state.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
Step 7: Multi-Team Terraform
Lock conflicts become exponentially worse as teams grow. Two engineers working on the same monolithic state file will block each other constantly. The solution is to split state into smaller, scoped files.
State File Per Environment and Service
Instead of one massive state file, structure your Terraform into separate state files per environment and per service:
terraform/
networking/
prod/ -> s3://state/networking/prod/terraform.tfstate
staging/ -> s3://state/networking/staging/terraform.tfstate
compute/
prod/ -> s3://state/compute/prod/terraform.tfstate
staging/ -> s3://state/compute/staging/terraform.tfstate
database/
prod/ -> s3://state/database/prod/terraform.tfstate
staging/ -> s3://state/database/staging/terraform.tfstate
Now the networking team and the compute team can run terraform apply simultaneously without ever hitting a lock conflict. The blast radius of any state corruption is limited to a single service in a single environment.
Workspaces vs. Directory Structure
Terraform workspaces let you use a single configuration with multiple state files (one per workspace). This works well for identical environments that differ only in variables (dev, staging, prod with the same resources). Directory-based separation is better when environments have genuinely different configurations. I prefer the directory approach for production use because it makes the state separation explicit and visible in your repository structure.
Cross-Stack References
When one Terraform stack needs to reference outputs from another (for example, the compute stack needs the VPC ID from the networking stack), use remote state data sources:
data "terraform_remote_state" "networking" {
backend = "s3"
config = {
bucket = "mycompany-terraform-state"
key = "networking/prod/terraform.tfstate"
region = "us-east-1"
}
}
resource "aws_instance" "web" {
subnet_id = data.terraform_remote_state.networking.outputs.public_subnet_id
# ...
}
Terraform Backend Comparison
Choosing the right backend depends on your cloud provider, team size, and budget. Here is how the major options compare:
| Backend | Locking Mechanism | Auto-Unlock on Crash | Versioning | Cost | Setup Complexity |
|---|---|---|---|---|---|
| S3 + DynamoDB | DynamoDB conditional write | No (manual force-unlock or TTL) | Yes (S3 versioning) | ~$1/month | Medium |
| GCS | Object metadata lock | No (manual force-unlock) | Yes (object versioning) | ~$1/month | Low |
| Azure Blob | Blob lease | Yes (lease expires after 60s) | Yes (blob snapshots) | ~$1/month | Medium |
| Terraform Cloud | Built-in (API-managed) | Yes (UI + auto-timeout) | Yes (built-in) | Free tier / $20+/user | Low |
| Consul | Session-based KV lock | Yes (session TTL) | No | Self-hosted | High |
| PostgreSQL | pg_advisory_lock | Yes (connection drops = lock release) | No | Self-hosted / RDS cost | Medium |
Notable takeaway: Azure Blob, Terraform Cloud, Consul, and PostgreSQL all auto-release locks when the holding process crashes. S3 + DynamoDB and GCS do not, which is why orphaned locks are almost exclusively an AWS and GCP problem. If you are on Azure or using Terraform Cloud, you will rarely need force-unlock.
The Real Cost of State Corruption
I have seen state corruption incidents take anywhere from 2 hours to 2 full days to resolve, depending on the size of the infrastructure. The worst case I dealt with involved a production account with 400+ resources, a corrupted state file, and no S3 versioning enabled. The team had to manually inventory every resource in AWS, cross-reference it against the Terraform configuration, and run terraform import for each one. It took three engineers two days.
The cost calculation is straightforward. Three senior engineers at $80/hour for 16 hours each comes to $3,840 in labor alone. That does not include the opportunity cost of those engineers not working on feature delivery, the deployment freeze during recovery, or the risk of human error during manual imports introducing new inconsistencies.
Compare that to the cost of prevention: enable S3 versioning ($0.023/GB for version storage), use DynamoDB locking ($0 at low volume with PAY_PER_REQUEST), and configure CI/CD concurrency groups (free). Prevention costs almost nothing. Recovery costs thousands. If your state file contains secrets (and it almost certainly does), a corrupted or exposed state file is also a credential exposure incident.
Secure Your Infrastructure Secrets
Terraform state files often contain database credentials and API keys in plaintext. Use SecureBin to share sensitive configuration securely with your team, with automatic expiration and zero-knowledge encryption.
Run a Free Security CheckCommon Mistakes to Avoid
After years of Terraform incident response, these are the mistakes I see repeatedly:
- Force-unlocking while an apply is still running. This is the single most dangerous Terraform operation. Always verify the lock is orphaned before force-unlocking. Check CI dashboards, ask your team, and inspect the DynamoDB record.
- No S3 versioning on the state bucket. Without versioning, a corrupted state write is permanent. Enabling versioning after the fact does not recover previous versions. Do it when you create the bucket.
- No DynamoDB table configured. If you omit the
dynamodb_tableparameter in your backend config, Terraform runs without locking entirely. No error, no warning. Just silent concurrent write risk. - Sharing a single state file across environments. Dev, staging, and prod should never share state. A bad apply to dev should never risk corrupting prod state. Use separate state keys or directories.
- Running Terraform locally when CI owns state. If your CI/CD pipeline is the designated Terraform executor, running
terraform applyfrom your laptop creates a race condition. Either CI runs Terraform, or humans run Terraform. Not both. - Not backing up before state surgery. Always run
terraform state pull > backup.tfstatebefore anystate rm,import, orstate mvoperation. If the surgery goes wrong, you can restore the backup.
Frequently Asked Questions
Is terraform force-unlock safe?
Yes, if and only if the lock is orphaned (the process that acquired it is no longer running). Force-unlocking an active lock will allow concurrent state writes, which can corrupt your state file. Always check your CI/CD dashboards and query the DynamoDB lock table before running force-unlock. When the lock is genuinely orphaned, force-unlock is completely safe and the intended recovery mechanism.
How do I prevent state lock errors in CI/CD?
Use pipeline-level concurrency controls. In GitHub Actions, set concurrency: group: terraform-prod with cancel-in-progress: false. In GitLab, use resource_group. In Atlantis, project-level locking is automatic. The goal is ensuring only one Terraform operation runs per state file at any given time. Also consider splitting large state files into smaller, service-scoped ones to reduce contention.
Can I recover a deleted or corrupted state file?
If S3 versioning is enabled, yes. Navigate to the S3 console, find your state file, view the version history, and restore a previous version. If versioning was not enabled, recovery requires manually importing every resource back into a fresh state file using terraform import. For large infrastructures, this can take days. This is why S3 versioning is non-negotiable for production state buckets.
What happens if two people run terraform apply at the same time?
If locking is configured correctly, the second person gets the "Error acquiring the state lock" message and their operation is blocked. No corruption occurs. If locking is not configured (no DynamoDB table), both operations proceed simultaneously. The second apply to finish writes its state, overwriting the first. Resources created by the first apply may be orphaned (they exist in AWS but not in state). This is extremely difficult to recover from.
Should I use S3 state backend or Terraform Cloud?
For small teams (under 5 engineers), S3 + DynamoDB is simpler and cheaper. You control the infrastructure, and the setup is a one-time effort. For larger teams, Terraform Cloud offers advantages: built-in lock management with auto-unlock, a web UI for viewing runs and state history, policy enforcement with Sentinel, and cost estimation. The free tier supports up to 500 managed resources. If your team has frequent lock conflicts or needs audit trails, Terraform Cloud pays for itself quickly.
The Bottom Line
Terraform state lock errors are a symptom, not the disease. The real problem is that your team lacks guardrails around concurrent Terraform operations. Fix the symptom with force-unlock when the lock is orphaned. Fix the disease with proper CI/CD concurrency controls, split state files, S3 versioning, and DynamoDB locking. The cost of setting this up correctly is measured in hours. The cost of recovering from state corruption is measured in days.
Related guides: AWS Security Checklist for Production, The Danger of Exposed .env Files, Terraform Beginner Guide, and 70+ free DevOps and security tools.
Usman has 10+ years of experience securing enterprise infrastructure, managing high-traffic servers, and building zero-knowledge security tools. Read more about the author.