GitHub Actions Workflow Failed: Debug and Fix Every Common Error
Your pipeline just went red. The commit looked fine, tests passed locally, and nothing changed in the workflow file. Yet GitHub Actions disagrees. Let us walk through every common failure, what causes it, and exactly how to fix it so you can get back to shipping code.
TL;DR: Click the failed job, then expand the red step and read the error line. Most common causes: YAML indentation mistakes, a secret name typo, missing permissions on GITHUB_TOKEN, a Docker build layer failure, or a flaky test. This guide covers all nine categories of failure with copy-paste fixes for each one.
Step 1: Reading the Error Logs
Before you change anything, you need to actually read what GitHub is telling you. This sounds obvious, but most people glance at the red X and start guessing. Do not do that.
Here is how to get to the useful information quickly:
- Go to the Actions tab in your repository
- Click the failed workflow run (it will have a red X icon)
- On the left sidebar, click the specific job that failed
- Expand the step with the red X. The error message is right there
- If the log is long, use the search box at the top of the log viewer. Search for
Error,failed, orexit code
GitHub also provides a handy Annotations section at the top of the workflow run summary. This pulls out errors and warnings from the logs so you do not have to scroll through thousands of lines. Check there first.
One thing that trips people up: sometimes the real error is not in the step that failed. A step might fail because a previous step set up the environment incorrectly. If the error message in the failed step does not make sense, scroll up and check the output of earlier steps.
Step 2: YAML Syntax Errors
YAML is the most common source of workflow failures, and it is also the most frustrating. A single extra space, a missing colon, or a tab character can break your entire pipeline. YAML does not use tabs. Ever. If your editor inserts tabs, switch to spaces immediately.
Common YAML mistakes
Wrong indentation depth. GitHub Actions uses a strict hierarchy: the jobs key is at the root, each job is indented two spaces, steps is indented under the job, and each step is indented under steps. Here is the correct structure:
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run tests
run: npm test
Confusing env with with. This one gets everyone. The env key sets environment variables. The with key passes inputs to an action. They are not interchangeable:
# WRONG - trying to pass action inputs via env
- uses: actions/setup-node@v4
env:
node-version: '20'
# CORRECT - action inputs go under with
- uses: actions/setup-node@v4
with:
node-version: '20'
Missing quotes around values that look like numbers or booleans. YAML interprets on as a boolean true. If your branch is literally named on (unlikely but possible), you need quotes. More commonly, version strings like 3.10 get interpreted as the float 3.1. Always quote version numbers:
# WRONG - 3.10 becomes 3.1
python-version: 3.10
# CORRECT
python-version: '3.10'
Validate locally before pushing
Install actionlint to catch YAML errors before they hit CI. It understands GitHub Actions syntax specifically, not just generic YAML:
# Install on macOS
brew install actionlint
# Run against your workflow files
actionlint .github/workflows/*.yml
It catches things like invalid runs-on values, unknown action inputs, expression syntax errors, and shell script issues. Run it in a pre-commit hook and you will never push a broken workflow file again.
Step 3: Secrets Not Available
Your workflow references ${{ secrets.MY_API_KEY }} but the step fails because the value is empty. This is one of the most common issues and it has several causes.
Secret name is case-sensitive
Secret names in GitHub are case-sensitive. If you created a secret named API_KEY but reference it as api_key or Api_Key, it will resolve to an empty string. GitHub will not warn you about this. It just silently injects nothing.
Repository secrets vs. environment secrets
GitHub has two types of secrets: repository secrets and environment secrets. If you created the secret under a specific environment (like production or staging), your job must declare that environment to access it:
jobs:
deploy:
runs-on: ubuntu-latest
environment: production # Required to access environment secrets
steps:
- run: echo "Deploying with ${{ secrets.DEPLOY_KEY }}"
Without the environment declaration, the job can only see repository-level secrets.
Fork pull requests cannot access secrets
This is a security feature, not a bug. When someone opens a pull request from a fork, GitHub does not expose your repository secrets to the workflow. This prevents a malicious contributor from adding a step that prints your secrets. If your CI requires secrets for tests (like a test API key), you have two options: use pull_request_target (carefully, with strict controls) or restructure your tests to work without real credentials by using mocks.
GITHUB_TOKEN scope
The built-in GITHUB_TOKEN secret is automatically available in every workflow. But its default permissions depend on your repository settings. Go to Settings > Actions > General and check the Workflow permissions section. If it is set to "Read repository contents and packages permissions," the token cannot write anything. More on this in the next section.
Step 4: Permission Denied
You see errors like Resource not accessible by integration, 403 Forbidden, or Permission denied. This almost always comes down to the GITHUB_TOKEN not having the right permissions.
Declaring permissions explicitly
Since November 2023, GitHub recommends declaring permissions explicitly in your workflow file. This follows the principle of least privilege and makes failures much easier to debug:
permissions:
contents: read # Checkout code
packages: write # Push to GHCR
id-token: write # OIDC for cloud deployments
pull-requests: write # Comment on PRs
issues: write # Create/update issues
You can set permissions at the workflow level (applies to all jobs) or at the job level (overrides workflow-level for that specific job). If you set any permission explicitly, all other permissions default to none. This is a common gotcha. You add packages: write and suddenly your checkout step fails because contents: read is no longer implied.
Common permission scenarios
- Pushing to the repo (committing generated files, updating version numbers): needs
contents: write - Pushing Docker images to GitHub Container Registry: needs
packages: write - Deploying to AWS/GCP/Azure with OIDC: needs
id-token: write - Commenting on pull requests: needs
pull-requests: write - Creating releases: needs
contents: write - Updating deployment status: needs
deployments: write
Organization-level restrictions
Even if your workflow declares the right permissions, your organization might restrict what GITHUB_TOKEN can do. Check with your org admin if you see permission errors that your workflow-level declarations should cover.
Step 5: Docker Build Failures in CI
Docker builds that work perfectly on your laptop have a tendency to break in CI. The environment is different, the caching is different, and the network behavior is different.
Layer caching
Without caching, every CI run builds every Docker layer from scratch. This is slow and wastes compute minutes. Use the actions/cache action or BuildKit inline caching to speed things up:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ghcr.io/myorg/myapp:latest
cache-from: type=gha
cache-to: type=gha,mode=max
The type=gha cache backend stores layers in the GitHub Actions cache, which is free and fast. It has a 10 GB limit per repository. If you hit that limit, older cache entries are evicted automatically.
Multi-platform build failures
Building for both linux/amd64 and linux/arm64 requires QEMU emulation in CI. Arm builds under emulation are 5 to 10 times slower than native builds and sometimes hit timeout. If your multi-platform build keeps timing out, consider building each platform in a separate job and using docker manifest to combine them.
Context and Dockerfile location
The build context in CI is the runner workspace, not your laptop. If your Dockerfile references files with relative paths, make sure the context input is set correctly. A common mistake is having COPY . . in your Dockerfile but setting the wrong build context in the action.
Step 6: Test Failures and Flakes
Your tests pass locally but fail in CI. Or they fail randomly one out of every five runs. Flaky tests are the number one productivity killer in CI/CD pipelines.
Why tests behave differently in CI
- Timing: CI runners have different CPU and I/O characteristics than your machine. Tests with tight timeouts or race conditions fail more often
- Environment: Different OS version, different installed packages, different locale settings
- Parallelism: If your test suite runs tests in parallel, resource contention on shared CI runners can cause failures
- Network: Tests that call external APIs may fail due to rate limiting or network latency differences
Dealing with flaky tests
Use the nick-fields/retry action to automatically retry flaky steps. This is a band-aid, not a fix, but it keeps your pipeline green while you investigate the root cause:
- uses: nick-fields/retry@v3
with:
max_attempts: 3
timeout_minutes: 10
command: npm test
Upload test artifacts for debugging
When tests fail in CI, you need the failure details. Upload test reports, screenshots, and logs as artifacts so you can download and inspect them:
- name: Upload test results
if: failure()
uses: actions/upload-artifact@v4
with:
name: test-results
path: |
test-results/
screenshots/
retention-days: 7
The if: failure() condition ensures artifacts are only uploaded when something goes wrong, saving storage.
Step 7: Timeout Exceeded
The default timeout for a GitHub Actions job is 360 minutes (6 hours). If your job hits this limit, something is very wrong. But even shorter hangs of 30 to 60 minutes waste your CI minutes and block your pipeline.
Set explicit timeouts
Always set timeout-minutes on your jobs. A build that normally takes 5 minutes should not be allowed to run for 6 hours:
jobs:
build:
runs-on: ubuntu-latest
timeout-minutes: 15
steps:
- uses: actions/checkout@v4
- run: npm ci
- run: npm test
Common causes of hangs
- Interactive prompts: A command waiting for user input (like
apt installwithout-yornpm initwithout--yes) - Deadlocked processes: Two processes waiting for each other (common in integration tests with databases)
- Infinite loops: A script or test stuck in a retry loop with no exit condition
- Network waits: Downloading a large file from a slow or unresponsive server
- Docker pull rate limits: Anonymous Docker Hub pulls are rate-limited. If the runner IP is rate-limited, pulls hang or fail. Authenticate with Docker Hub or use a mirror
Step 8: Runner Issues
Sometimes the problem is not your workflow. It is the machine running it.
ubuntu-latest is a moving target
When you specify runs-on: ubuntu-latest, GitHub periodically updates what "latest" means. One day your workflow runs on Ubuntu 22.04, the next it is on 24.04. If your build depends on a specific system library version, a tool that ships with the runner, or a kernel feature, this silent upgrade can break things.
The fix is to pin to a specific version:
# Instead of this
runs-on: ubuntu-latest
# Pin to a specific version
runs-on: ubuntu-22.04
Check the runner-images repository for the full list of installed software on each runner version.
Self-hosted runner offline
If you use self-hosted runners, your workflow will queue indefinitely when no matching runner is online. The job shows "Waiting for a runner to pick up this job" and never progresses. Check that your runner is actually running, that its labels match the runs-on value in your workflow, and that it has not been removed from the repository or organization.
# Check runner status
# Settings > Actions > Runners
# Or via API:
gh api repos/{owner}/{repo}/actions/runners
Runner version mismatch
Self-hosted runners need to be updated regularly. If the runner version is too old, it may not support newer workflow features. GitHub shows a warning in the workflow logs when a runner is outdated. Update it by downloading the latest runner package from the releases page.
Step 9: Caching Not Working
You set up actions/cache but your builds are not getting faster. The cache miss rate is 100%. Here is why.
Cache key mismatch
The cache key must match exactly. If you use a hash of your lockfile as part of the key, any change to the lockfile invalidates the cache. This is by design, but if your lockfile changes frequently (maybe a bot is updating dependencies), your cache hit rate drops to zero:
- uses: actions/cache@v4
with:
path: ~/.npm
key: npm-${{ runner.os }}-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
npm-${{ runner.os }}-
The restore-keys field is crucial. When the exact key does not match, GitHub falls back to the most recent cache entry that matches the prefix. Without restore-keys, every lockfile change means a full cache miss.
Cache eviction
GitHub evicts cache entries that have not been accessed in over 7 days. The total cache size per repository is limited to 10 GB. If your project has many branches each generating their own caches, you can hit this limit quickly. Consolidate cache keys where possible and use the gh actions-cache list command to see what is consuming your cache quota.
Cache scope
Caches are scoped to the branch where they were created. A cache created on a feature branch is accessible from that branch and from the default branch, but not from other feature branches. Caches created on the default branch (main/master) are accessible from all branches. If you want maximum cache reuse, run a cache-warming job on your default branch.
Secure Your CI/CD Pipeline
Leaked secrets in CI logs, exposed tokens in workflow files, misconfigured permissions. Use SecureBin Exposure Checker to scan your domain for exposed credentials and security misconfigurations.
Run a Free Security ScanPro Tip: Run GitHub Actions Locally with act
nektos/act is an open-source tool that runs your GitHub Actions workflows locally using Docker. It catches about 90% of issues without waiting for CI. Install it and run it before every push:
# Install
brew install act # macOS
choco install act-cli # Windows
sudo snap install act # Linux
# Run the default push event
act
# Run a specific job
act -j build
# Pass secrets via .env file
act --secret-file .secrets
# Use a specific runner image
act -P ubuntu-latest=catthehacker/ubuntu:full-latest
Limitations: act does not support OIDC tokens, GitHub-hosted services (like the built-in PostgreSQL service containers behave slightly differently), or some GitHub-specific context variables. But for YAML validation, script testing, and environment debugging, it is invaluable.
Comparison: GitHub Actions vs. Other CI/CD Platforms
If you are evaluating CI/CD platforms or migrating between them, here is how they stack up on the issues covered in this article:
| Feature | GitHub Actions | GitLab CI | CircleCI | Jenkins |
|---|---|---|---|---|
| Config format | YAML (.github/workflows/) | YAML (.gitlab-ci.yml) | YAML (.circleci/config.yml) | Groovy (Jenkinsfile) |
| Free tier | 2,000 min/month (public unlimited) | 400 min/month | 6,000 min/month | Self-hosted (free OSS) |
| Secrets management | Repo + environment secrets | Project + group variables | Contexts + project env vars | Credentials plugin |
| Caching | actions/cache (10 GB limit) | Built-in cache directive | Built-in cache (several GB) | Plugin-based |
| Docker support | Native (buildx, GHCR) | Native (DinD, registry) | Native (remote Docker) | Plugin-based |
| OIDC cloud auth | Built-in id-token | Built-in CI_JOB_JWT | OIDC support | Plugin-based |
| Local testing | nektos/act | gitlab-runner exec (deprecated) | circleci local execute | Run locally (native) |
| Marketplace | 20,000+ actions | CI/CD components catalog | Orbs registry | 1,800+ plugins |
GitHub Actions wins on ecosystem size and tight GitHub integration. GitLab CI wins on built-in features (no marketplace needed for common tasks). CircleCI has the most generous free tier. Jenkins gives you full control but requires you to manage everything yourself.
Security Best Practices for GitHub Actions
A compromised CI/CD pipeline can give an attacker access to your production environment, cloud credentials, and source code. Treat your workflow files with the same care as your application code.
Pin actions to SHA, not tags
When you write uses: actions/checkout@v4, the v4 tag can be moved by the action maintainer to point to a different commit at any time. If the maintainer's account is compromised, an attacker could push malicious code and retag. Pin to the full commit SHA instead:
# Vulnerable to tag manipulation
- uses: actions/checkout@v4
# Pinned to specific commit - safe
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1
Use StepSecurity Secure Repo or pinact to automate SHA pinning across all your workflow files.
Protect workflow files with CODEOWNERS
Add .github/workflows/ to your CODEOWNERS file so that any changes to workflow files require approval from a security-aware team member:
# .github/CODEOWNERS
.github/workflows/ @your-org/platform-team
Use OpenSSF Scorecard
The OpenSSF Scorecard project automatically evaluates your repository's security practices, including CI/CD configuration. Run it as a GitHub Action to get a security score and actionable recommendations. It checks for pinned dependencies, branch protection, token permissions, and more.
Common Mistakes
Frequently Asked Questions
Why does my GitHub Actions workflow fail only on push but not on pull_request?
The most common reason is secrets availability. For pull requests from forks, GitHub does not inject repository secrets for security reasons. Your workflow might pass on push (where secrets are available) but fail on pull_request because environment variables resolve to empty strings. Another cause is the GITHUB_TOKEN permission scope, which is read-only for fork PRs by default. Check your workflow for any step that writes to the repo or calls an API that needs write access.
How do I debug a GitHub Actions workflow locally without pushing every change?
Use nektos/act, an open-source tool that runs GitHub Actions locally using Docker. Install it with brew install act (macOS) or your package manager, then run act in your repo root. It reads your .github/workflows/ files and executes them in Docker containers that mimic GitHub-hosted runners. It does not support every feature (like OIDC or certain GitHub-hosted services), but it catches YAML errors, script bugs, and environment issues before you push.
Can I re-run only the failed jobs in a GitHub Actions workflow?
Yes. Go to the Actions tab, click the failed workflow run, and click Re-run failed jobs in the top right. This skips any jobs that already passed and only re-executes the ones that failed. This is useful for flaky tests or transient network errors. You can also re-run a single job by clicking on it and selecting Re-run this job. Note that re-runs use the same commit SHA and workflow file from the original run, not the latest version on the branch.
Is Your Infrastructure Leaking Secrets?
CI/CD pipelines are only as secure as the secrets they use. Scan your domain for exposed credentials, misconfigured headers, and security gaps with SecureBin Exposure Checker.
Check Your Exposure FreeThe Bottom Line
GitHub Actions failures are rarely mysterious once you know where to look. Start with the error logs (Step 1), then work through the categories: YAML syntax, secrets, permissions, Docker, tests, timeouts, runners, and caching. Most failures fall into one of these nine buckets, and the fix is usually a one or two line change.
The single best investment you can make is setting up act for local testing and actionlint for YAML validation. Together, they catch the majority of issues before you even push. Your CI minutes (and your teammates) will thank you.
Related tools: Exposure Checker, YAML Validator, JSON Formatter, ENV Validator, Diff Checker, and 70+ more free tools.
Usman has 10+ years of experience securing enterprise infrastructure, managing high-traffic servers, and building zero-knowledge security tools. Read more about the author.