← Back to Blog

GitHub Actions Workflow Failed: Debug and Fix Every Common Error

Your pipeline just went red. The commit looked fine, tests passed locally, and nothing changed in the workflow file. Yet GitHub Actions disagrees. Let us walk through every common failure, what causes it, and exactly how to fix it so you can get back to shipping code.

TL;DR: Click the failed job, then expand the red step and read the error line. Most common causes: YAML indentation mistakes, a secret name typo, missing permissions on GITHUB_TOKEN, a Docker build layer failure, or a flaky test. This guide covers all nine categories of failure with copy-paste fixes for each one.

Step 1: Reading the Error Logs

Before you change anything, you need to actually read what GitHub is telling you. This sounds obvious, but most people glance at the red X and start guessing. Do not do that.

Here is how to get to the useful information quickly:

  1. Go to the Actions tab in your repository
  2. Click the failed workflow run (it will have a red X icon)
  3. On the left sidebar, click the specific job that failed
  4. Expand the step with the red X. The error message is right there
  5. If the log is long, use the search box at the top of the log viewer. Search for Error, failed, or exit code

GitHub also provides a handy Annotations section at the top of the workflow run summary. This pulls out errors and warnings from the logs so you do not have to scroll through thousands of lines. Check there first.

One thing that trips people up: sometimes the real error is not in the step that failed. A step might fail because a previous step set up the environment incorrectly. If the error message in the failed step does not make sense, scroll up and check the output of earlier steps.

Step 2: YAML Syntax Errors

YAML is the most common source of workflow failures, and it is also the most frustrating. A single extra space, a missing colon, or a tab character can break your entire pipeline. YAML does not use tabs. Ever. If your editor inserts tabs, switch to spaces immediately.

Common YAML mistakes

Wrong indentation depth. GitHub Actions uses a strict hierarchy: the jobs key is at the root, each job is indented two spaces, steps is indented under the job, and each step is indented under steps. Here is the correct structure:

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run tests
        run: npm test

Confusing env with with. This one gets everyone. The env key sets environment variables. The with key passes inputs to an action. They are not interchangeable:

# WRONG - trying to pass action inputs via env
- uses: actions/setup-node@v4
  env:
    node-version: '20'

# CORRECT - action inputs go under with
- uses: actions/setup-node@v4
  with:
    node-version: '20'

Missing quotes around values that look like numbers or booleans. YAML interprets on as a boolean true. If your branch is literally named on (unlikely but possible), you need quotes. More commonly, version strings like 3.10 get interpreted as the float 3.1. Always quote version numbers:

# WRONG - 3.10 becomes 3.1
python-version: 3.10

# CORRECT
python-version: '3.10'

Validate locally before pushing

Install actionlint to catch YAML errors before they hit CI. It understands GitHub Actions syntax specifically, not just generic YAML:

# Install on macOS
brew install actionlint

# Run against your workflow files
actionlint .github/workflows/*.yml

It catches things like invalid runs-on values, unknown action inputs, expression syntax errors, and shell script issues. Run it in a pre-commit hook and you will never push a broken workflow file again.

Step 3: Secrets Not Available

Your workflow references ${{ secrets.MY_API_KEY }} but the step fails because the value is empty. This is one of the most common issues and it has several causes.

Secret name is case-sensitive

Secret names in GitHub are case-sensitive. If you created a secret named API_KEY but reference it as api_key or Api_Key, it will resolve to an empty string. GitHub will not warn you about this. It just silently injects nothing.

Repository secrets vs. environment secrets

GitHub has two types of secrets: repository secrets and environment secrets. If you created the secret under a specific environment (like production or staging), your job must declare that environment to access it:

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: production  # Required to access environment secrets
    steps:
      - run: echo "Deploying with ${{ secrets.DEPLOY_KEY }}"

Without the environment declaration, the job can only see repository-level secrets.

Fork pull requests cannot access secrets

This is a security feature, not a bug. When someone opens a pull request from a fork, GitHub does not expose your repository secrets to the workflow. This prevents a malicious contributor from adding a step that prints your secrets. If your CI requires secrets for tests (like a test API key), you have two options: use pull_request_target (carefully, with strict controls) or restructure your tests to work without real credentials by using mocks.

GITHUB_TOKEN scope

The built-in GITHUB_TOKEN secret is automatically available in every workflow. But its default permissions depend on your repository settings. Go to Settings > Actions > General and check the Workflow permissions section. If it is set to "Read repository contents and packages permissions," the token cannot write anything. More on this in the next section.

Step 4: Permission Denied

You see errors like Resource not accessible by integration, 403 Forbidden, or Permission denied. This almost always comes down to the GITHUB_TOKEN not having the right permissions.

Declaring permissions explicitly

Since November 2023, GitHub recommends declaring permissions explicitly in your workflow file. This follows the principle of least privilege and makes failures much easier to debug:

permissions:
  contents: read      # Checkout code
  packages: write     # Push to GHCR
  id-token: write     # OIDC for cloud deployments
  pull-requests: write # Comment on PRs
  issues: write       # Create/update issues

You can set permissions at the workflow level (applies to all jobs) or at the job level (overrides workflow-level for that specific job). If you set any permission explicitly, all other permissions default to none. This is a common gotcha. You add packages: write and suddenly your checkout step fails because contents: read is no longer implied.

Common permission scenarios

  • Pushing to the repo (committing generated files, updating version numbers): needs contents: write
  • Pushing Docker images to GitHub Container Registry: needs packages: write
  • Deploying to AWS/GCP/Azure with OIDC: needs id-token: write
  • Commenting on pull requests: needs pull-requests: write
  • Creating releases: needs contents: write
  • Updating deployment status: needs deployments: write

Organization-level restrictions

Even if your workflow declares the right permissions, your organization might restrict what GITHUB_TOKEN can do. Check with your org admin if you see permission errors that your workflow-level declarations should cover.

Step 5: Docker Build Failures in CI

Docker builds that work perfectly on your laptop have a tendency to break in CI. The environment is different, the caching is different, and the network behavior is different.

Layer caching

Without caching, every CI run builds every Docker layer from scratch. This is slow and wastes compute minutes. Use the actions/cache action or BuildKit inline caching to speed things up:

- name: Set up Docker Buildx
  uses: docker/setup-buildx-action@v3

- name: Build and push
  uses: docker/build-push-action@v5
  with:
    context: .
    push: true
    tags: ghcr.io/myorg/myapp:latest
    cache-from: type=gha
    cache-to: type=gha,mode=max

The type=gha cache backend stores layers in the GitHub Actions cache, which is free and fast. It has a 10 GB limit per repository. If you hit that limit, older cache entries are evicted automatically.

Multi-platform build failures

Building for both linux/amd64 and linux/arm64 requires QEMU emulation in CI. Arm builds under emulation are 5 to 10 times slower than native builds and sometimes hit timeout. If your multi-platform build keeps timing out, consider building each platform in a separate job and using docker manifest to combine them.

Context and Dockerfile location

The build context in CI is the runner workspace, not your laptop. If your Dockerfile references files with relative paths, make sure the context input is set correctly. A common mistake is having COPY . . in your Dockerfile but setting the wrong build context in the action.

Step 6: Test Failures and Flakes

Your tests pass locally but fail in CI. Or they fail randomly one out of every five runs. Flaky tests are the number one productivity killer in CI/CD pipelines.

Why tests behave differently in CI

  • Timing: CI runners have different CPU and I/O characteristics than your machine. Tests with tight timeouts or race conditions fail more often
  • Environment: Different OS version, different installed packages, different locale settings
  • Parallelism: If your test suite runs tests in parallel, resource contention on shared CI runners can cause failures
  • Network: Tests that call external APIs may fail due to rate limiting or network latency differences

Dealing with flaky tests

Use the nick-fields/retry action to automatically retry flaky steps. This is a band-aid, not a fix, but it keeps your pipeline green while you investigate the root cause:

- uses: nick-fields/retry@v3
  with:
    max_attempts: 3
    timeout_minutes: 10
    command: npm test

Upload test artifacts for debugging

When tests fail in CI, you need the failure details. Upload test reports, screenshots, and logs as artifacts so you can download and inspect them:

- name: Upload test results
  if: failure()
  uses: actions/upload-artifact@v4
  with:
    name: test-results
    path: |
      test-results/
      screenshots/
    retention-days: 7

The if: failure() condition ensures artifacts are only uploaded when something goes wrong, saving storage.

Step 7: Timeout Exceeded

The default timeout for a GitHub Actions job is 360 minutes (6 hours). If your job hits this limit, something is very wrong. But even shorter hangs of 30 to 60 minutes waste your CI minutes and block your pipeline.

Set explicit timeouts

Always set timeout-minutes on your jobs. A build that normally takes 5 minutes should not be allowed to run for 6 hours:

jobs:
  build:
    runs-on: ubuntu-latest
    timeout-minutes: 15
    steps:
      - uses: actions/checkout@v4
      - run: npm ci
      - run: npm test

Common causes of hangs

  • Interactive prompts: A command waiting for user input (like apt install without -y or npm init without --yes)
  • Deadlocked processes: Two processes waiting for each other (common in integration tests with databases)
  • Infinite loops: A script or test stuck in a retry loop with no exit condition
  • Network waits: Downloading a large file from a slow or unresponsive server
  • Docker pull rate limits: Anonymous Docker Hub pulls are rate-limited. If the runner IP is rate-limited, pulls hang or fail. Authenticate with Docker Hub or use a mirror

Step 8: Runner Issues

Sometimes the problem is not your workflow. It is the machine running it.

ubuntu-latest is a moving target

When you specify runs-on: ubuntu-latest, GitHub periodically updates what "latest" means. One day your workflow runs on Ubuntu 22.04, the next it is on 24.04. If your build depends on a specific system library version, a tool that ships with the runner, or a kernel feature, this silent upgrade can break things.

The fix is to pin to a specific version:

# Instead of this
runs-on: ubuntu-latest

# Pin to a specific version
runs-on: ubuntu-22.04

Check the runner-images repository for the full list of installed software on each runner version.

Self-hosted runner offline

If you use self-hosted runners, your workflow will queue indefinitely when no matching runner is online. The job shows "Waiting for a runner to pick up this job" and never progresses. Check that your runner is actually running, that its labels match the runs-on value in your workflow, and that it has not been removed from the repository or organization.

# Check runner status
# Settings > Actions > Runners
# Or via API:
gh api repos/{owner}/{repo}/actions/runners

Runner version mismatch

Self-hosted runners need to be updated regularly. If the runner version is too old, it may not support newer workflow features. GitHub shows a warning in the workflow logs when a runner is outdated. Update it by downloading the latest runner package from the releases page.

Step 9: Caching Not Working

You set up actions/cache but your builds are not getting faster. The cache miss rate is 100%. Here is why.

Cache key mismatch

The cache key must match exactly. If you use a hash of your lockfile as part of the key, any change to the lockfile invalidates the cache. This is by design, but if your lockfile changes frequently (maybe a bot is updating dependencies), your cache hit rate drops to zero:

- uses: actions/cache@v4
  with:
    path: ~/.npm
    key: npm-${{ runner.os }}-${{ hashFiles('**/package-lock.json') }}
    restore-keys: |
      npm-${{ runner.os }}-

The restore-keys field is crucial. When the exact key does not match, GitHub falls back to the most recent cache entry that matches the prefix. Without restore-keys, every lockfile change means a full cache miss.

Cache eviction

GitHub evicts cache entries that have not been accessed in over 7 days. The total cache size per repository is limited to 10 GB. If your project has many branches each generating their own caches, you can hit this limit quickly. Consolidate cache keys where possible and use the gh actions-cache list command to see what is consuming your cache quota.

Cache scope

Caches are scoped to the branch where they were created. A cache created on a feature branch is accessible from that branch and from the default branch, but not from other feature branches. Caches created on the default branch (main/master) are accessible from all branches. If you want maximum cache reuse, run a cache-warming job on your default branch.

Secure Your CI/CD Pipeline

Leaked secrets in CI logs, exposed tokens in workflow files, misconfigured permissions. Use SecureBin Exposure Checker to scan your domain for exposed credentials and security misconfigurations.

Run a Free Security Scan

Pro Tip: Run GitHub Actions Locally with act

nektos/act is an open-source tool that runs your GitHub Actions workflows locally using Docker. It catches about 90% of issues without waiting for CI. Install it and run it before every push:

# Install
brew install act       # macOS
choco install act-cli  # Windows
sudo snap install act  # Linux

# Run the default push event
act

# Run a specific job
act -j build

# Pass secrets via .env file
act --secret-file .secrets

# Use a specific runner image
act -P ubuntu-latest=catthehacker/ubuntu:full-latest

Limitations: act does not support OIDC tokens, GitHub-hosted services (like the built-in PostgreSQL service containers behave slightly differently), or some GitHub-specific context variables. But for YAML validation, script testing, and environment debugging, it is invaluable.

Comparison: GitHub Actions vs. Other CI/CD Platforms

If you are evaluating CI/CD platforms or migrating between them, here is how they stack up on the issues covered in this article:

Feature GitHub Actions GitLab CI CircleCI Jenkins
Config format YAML (.github/workflows/) YAML (.gitlab-ci.yml) YAML (.circleci/config.yml) Groovy (Jenkinsfile)
Free tier 2,000 min/month (public unlimited) 400 min/month 6,000 min/month Self-hosted (free OSS)
Secrets management Repo + environment secrets Project + group variables Contexts + project env vars Credentials plugin
Caching actions/cache (10 GB limit) Built-in cache directive Built-in cache (several GB) Plugin-based
Docker support Native (buildx, GHCR) Native (DinD, registry) Native (remote Docker) Plugin-based
OIDC cloud auth Built-in id-token Built-in CI_JOB_JWT OIDC support Plugin-based
Local testing nektos/act gitlab-runner exec (deprecated) circleci local execute Run locally (native)
Marketplace 20,000+ actions CI/CD components catalog Orbs registry 1,800+ plugins

GitHub Actions wins on ecosystem size and tight GitHub integration. GitLab CI wins on built-in features (no marketplace needed for common tasks). CircleCI has the most generous free tier. Jenkins gives you full control but requires you to manage everything yourself.

Security Best Practices for GitHub Actions

A compromised CI/CD pipeline can give an attacker access to your production environment, cloud credentials, and source code. Treat your workflow files with the same care as your application code.

Pin actions to SHA, not tags

When you write uses: actions/checkout@v4, the v4 tag can be moved by the action maintainer to point to a different commit at any time. If the maintainer's account is compromised, an attacker could push malicious code and retag. Pin to the full commit SHA instead:

# Vulnerable to tag manipulation
- uses: actions/checkout@v4

# Pinned to specific commit - safe
- uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.1.1

Use StepSecurity Secure Repo or pinact to automate SHA pinning across all your workflow files.

Protect workflow files with CODEOWNERS

Add .github/workflows/ to your CODEOWNERS file so that any changes to workflow files require approval from a security-aware team member:

# .github/CODEOWNERS
.github/workflows/ @your-org/platform-team

Use OpenSSF Scorecard

The OpenSSF Scorecard project automatically evaluates your repository's security practices, including CI/CD configuration. Run it as a GitHub Action to get a security score and actionable recommendations. It checks for pinned dependencies, branch protection, token permissions, and more.

Common Mistakes

Hardcoding secrets in workflow files Use repository or environment secrets. Never put API keys, tokens, or passwords directly in YAML.
Not setting timeout-minutes A stuck job burns 6 hours of CI minutes before it times out. Always set an explicit timeout.
Using ubuntu-latest in production pipelines Pin to a specific version like ubuntu-22.04 to avoid surprise breakage from runner image updates.
Ignoring the restore-keys field in caching Without restore-keys, any change to your lockfile means a complete cache miss. Always provide fallback prefixes.
Running all tests in a single job Split tests across parallel jobs using matrix strategy. A 30-minute test suite becomes 10 minutes with 3 parallel jobs.
Not using if: failure() for debug artifacts Test logs, screenshots, and reports are only useful when tests fail. Use conditional uploads to save storage.

Frequently Asked Questions

Why does my GitHub Actions workflow fail only on push but not on pull_request?

The most common reason is secrets availability. For pull requests from forks, GitHub does not inject repository secrets for security reasons. Your workflow might pass on push (where secrets are available) but fail on pull_request because environment variables resolve to empty strings. Another cause is the GITHUB_TOKEN permission scope, which is read-only for fork PRs by default. Check your workflow for any step that writes to the repo or calls an API that needs write access.

How do I debug a GitHub Actions workflow locally without pushing every change?

Use nektos/act, an open-source tool that runs GitHub Actions locally using Docker. Install it with brew install act (macOS) or your package manager, then run act in your repo root. It reads your .github/workflows/ files and executes them in Docker containers that mimic GitHub-hosted runners. It does not support every feature (like OIDC or certain GitHub-hosted services), but it catches YAML errors, script bugs, and environment issues before you push.

Can I re-run only the failed jobs in a GitHub Actions workflow?

Yes. Go to the Actions tab, click the failed workflow run, and click Re-run failed jobs in the top right. This skips any jobs that already passed and only re-executes the ones that failed. This is useful for flaky tests or transient network errors. You can also re-run a single job by clicking on it and selecting Re-run this job. Note that re-runs use the same commit SHA and workflow file from the original run, not the latest version on the branch.

Is Your Infrastructure Leaking Secrets?

CI/CD pipelines are only as secure as the secrets they use. Scan your domain for exposed credentials, misconfigured headers, and security gaps with SecureBin Exposure Checker.

Check Your Exposure Free

The Bottom Line

GitHub Actions failures are rarely mysterious once you know where to look. Start with the error logs (Step 1), then work through the categories: YAML syntax, secrets, permissions, Docker, tests, timeouts, runners, and caching. Most failures fall into one of these nine buckets, and the fix is usually a one or two line change.

The single best investment you can make is setting up act for local testing and actionlint for YAML validation. Together, they catch the majority of issues before you even push. Your CI minutes (and your teammates) will thank you.

Related tools: Exposure Checker, YAML Validator, JSON Formatter, ENV Validator, Diff Checker, and 70+ more free tools.

UK
Written by Usman Khan
DevOps Engineer | MSc Cybersecurity | CEH | AWS Solutions Architect

Usman has 10+ years of experience securing enterprise infrastructure, managing high-traffic servers, and building zero-knowledge security tools. Read more about the author.