Troubleshooting

This guide helps you diagnose and resolve common issues with Skyhook CI/CD workflows.

Build Failures

Docker Build Fails

Symptoms:

Build step fails in GitHub Actions
Error messages about Dockerfile syntax or missing files
Build context errors

Common causes and solutions:

Invalid Dockerfile syntax

Check your Dockerfile for:
- Correct instruction format (FROM, RUN, COPY, etc.)
- Proper line continuation with backslashes
- Valid base image references

Missing files or directories

Ensure files referenced in COPY or ADD commands exist:
- Check file paths are relative to build context
- Verify .dockerignore isn't excluding required files
- Confirm files are committed to Git

Base image not found

Verify base image exists and is accessible:
- Check image name and tag are correct
- Ensure you have access to private registries
- Try pulling the image locally first

Build context too large

Reduce build context size:
- Add node_modules, .git, etc. to .dockerignore
- Remove unnecessary files from repository
- Use multi-stage builds to minimize final image size

Debugging steps:

Review the full build logs in GitHub Actions
Try building locally: docker build -t test .
Check recent changes to Dockerfile or dependencies
Verify all required files are in the repository

Image Push Fails

Symptoms:

Build succeeds but push to registry fails
Authentication errors to container registry
“Repository does not exist” errors

Common causes and solutions:

Invalid registry credentials

For AWS ECR:
- Verify AWS_DEPLOY_ROLE or AWS credentials are correct
- Check IAM role has ecr:GetAuthorizationToken permission
- Ensure role trust policy includes GitHub OIDC provider

For GCP Artifact Registry:
- Verify WIF_PROVIDER and WIF_SERVICE_ACCOUNT are correct
- Check service account has Artifact Registry Writer role
- Ensure Workload Identity binding is configured

For Azure ACR:
- Verify service principal credentials
- Check service principal has AcrPush role

Repository doesn’t exist

AWS ECR: Repositories are created automatically
- Verify IAM permissions include ecr:CreateRepository

GCP/Azure: Create repository manually
- GCP: gcloud artifacts repositories create
- Azure: az acr repository create

Registry URL is incorrect

Check registry format in .koala.toml:
- AWS: 123456789.dkr.ecr.us-east-1.amazonaws.com
- GCP: us-docker.pkg.dev/project-id/repository
- Azure: myregistry.azurecr.io

Debugging steps:

Verify registry URL in workflow logs
Test authentication manually with cloud CLI tools
Check registry permissions in cloud console
Review GitHub secrets configuration

Deployment Failures

kubectl Deployment Fails

Symptoms:

Deployment step fails after successful build
“connection refused” or “unauthorized” errors
kubectl commands timeout

Common causes and solutions:

Invalid cluster credentials

Verify cluster access:
- Check cluster name and region are correct
- Ensure credentials have cluster admin permissions
- Test connection: kubectl get nodes

Cluster name format incorrect

Verify cluster format matches cloud provider:

AWS: aws/123456789/us-east-1/my-cluster
- Account ID, region, cluster name must be exact

GCP: gcp/my-project/us-central1/my-cluster
- Project ID, location, cluster name must match

Azure: azure/subscription-id/eastus/my-cluster
- Subscription ID, region, cluster name must match

Insufficient permissions

Ensure service account/role has Kubernetes permissions:
- AWS: IAM role needs eks:DescribeCluster
- GCP: Service account needs container.clusters.get
- Azure: Service principal needs AKS Cluster User

Invalid Kubernetes manifests

Check manifest syntax:
- Validate YAML syntax
- Ensure apiVersion is correct for your cluster
- Verify resource names follow Kubernetes naming rules
- Test locally: kubectl apply --dry-run=client -f manifests/

Debugging steps:

Review kubectl output in workflow logs
Verify cluster exists and is accessible
Check Kubernetes manifest syntax
Test deployment locally with kubectl

ArgoCD Not Syncing

Symptoms:

Workflow completes but changes don’t appear in cluster
ArgoCD shows “OutOfSync” status
Application health degraded

Common causes and solutions:

ArgoCD not configured correctly

Verify ArgoCD setup:
- Check ArgoCD is installed in cluster
- Ensure Application resource exists
- Verify repository is connected in ArgoCD
- Check sync policy is configured

Repository access issues

Ensure ArgoCD can access deployment repository:
- For private repos: Add deploy key or credentials in ArgoCD
- Verify repository URL is correct
- Check branch name matches ArgoCD Application spec

Manifest path incorrect

Check path configuration:
- Verify path in ArgoCD Application matches repo structure
- Ensure kustomization.yaml exists at specified path
- Check environment overlay path is correct

Sync policy prevents auto-sync

Check ArgoCD Application sync settings:
- Enable automated sync if desired
- Check for required manual approval
- Review sync options and prune settings

Debugging steps:

Check ArgoCD UI for application status
Review ArgoCD application logs: kubectl logs -n argocd <argocd-server-pod>
Verify Git commits appear in deployment repository
Manually trigger sync in ArgoCD UI
Check ArgoCD application events: kubectl describe application -n argocd <app-name>

Authentication Issues

AWS Authentication Fails

Symptoms:

“Unable to locate credentials” error
“Access denied” when accessing EKS or ECR
OIDC token validation errors

Common causes and solutions:

OIDC provider not configured

Ensure OIDC provider exists in AWS IAM:
- Provider URL: token.actions.githubusercontent.com
- Audience: sts.amazonaws.com
- Verify provider is active

IAM role trust policy incorrect

Check role trust policy includes GitHub:
{
  "Effect": "Allow",
  "Principal": {
    "Federated": "arn:aws:iam::ACCOUNT:oidc-provider/token.actions.githubusercontent.com"
  },
  "Action": "sts:AssumeRoleWithWebIdentity",
  "Condition": {
    "StringEquals": {
      "token.actions.githubusercontent.com:sub": "repo:ORG/REPO:ref:refs/heads/main"
    }
  }
}

Missing IAM permissions

Verify role has required policies:
- EKS access: eks:DescribeCluster, eks:ListClusters
- ECR access: ecr:GetAuthorizationToken, ecr:BatchCheckLayerAvailability,
              ecr:PutImage, ecr:InitiateLayerUpload, ecr:UploadLayerPart,
              ecr:CompleteLayerUpload, ecr:CreateRepository

AWS_DEPLOY_ROLE variable not set

Configure in GitHub repository:
- Go to Settings → Secrets and Variables → Actions
- Add variable: AWS_DEPLOY_ROLE
- Value: arn:aws:iam::123456789:role/github-actions-deploy

Debugging steps:

Verify OIDC provider exists in IAM
Check role ARN is correct in GitHub variable
Review IAM role trust policy and permissions
Test role assumption locally with AWS CLI

GCP Authentication Fails

Symptoms:

“Permission denied” errors
“Invalid JWT” or token validation errors
Cannot access GKE or Artifact Registry

Common causes and solutions:

Workload Identity not configured

Verify Workload Identity setup:
- Pool exists: gcloud iam workload-identity-pools describe github-actions
- Provider configured for GitHub OIDC
- Attribute mapping includes repository

Service account permissions missing

Ensure service account has required roles:
- GKE access: roles/container.developer
- Artifact Registry: roles/artifactregistry.writer
- Basic: roles/iam.serviceAccountTokenCreator

Workload Identity binding incorrect

Check service account IAM policy binding:
gcloud iam service-accounts get-iam-policy \
  SERVICE_ACCOUNT@PROJECT.iam.gserviceaccount.com

Should include:
- Role: roles/iam.workloadIdentityUser
- Member: principalSet://iam.googleapis.com/projects/PROJECT_NUM/locations/global/workloadIdentityPools/github-actions/attribute.repository/ORG/REPO

WIF variables not set correctly

Configure in GitHub repository:
- WIF_PROVIDER: projects/PROJECT_NUM/locations/global/workloadIdentityPools/github-actions/providers/github
- WIF_SERVICE_ACCOUNT: SERVICE_ACCOUNT@PROJECT.iam.gserviceaccount.com

Debugging steps:

Verify Workload Identity pool and provider exist
Check service account has necessary roles
Review Workload Identity binding for repository
Test authentication locally with gcloud

GitHub Authentication Fails

Symptoms:

Cannot access deployment repository
“Resource not accessible by integration” error
PAT or GitHub App authentication fails

Common causes and solutions:

GitHub App not installed

Verify GitHub App installation:
- Check app is installed on target repository
- Review app permissions include repository access
- Ensure app has Contents: Read & Write permission

GitHub App credentials incorrect

Check GitHub secrets:
- GH_APP_ID: Numeric app ID (not app name)
- GH_APP_PK: Complete private key including headers
  -----BEGIN RSA PRIVATE KEY-----
  ...
  -----END RSA PRIVATE KEY-----

PAT lacks required permissions

If using Personal Access Token:
- Ensure PAT has 'repo' scope
- Check PAT hasn't expired
- Verify user has access to deployment repository

Cross-organization access

For accessing repos in different organizations:
- GitHub App must be installed in target org
- PAT user must be member of target org
- Check organization SSO requirements

Debugging steps:

Verify GitHub App installation and permissions
Check secret values are complete and correct
Test repository access manually
Review workflow logs for specific error messages

Common Error Messages

”ImagePullBackOff” in Kubernetes

Cause: Kubernetes cannot pull the Docker image Solutions:

Verify image tag exists in registry
Check image name and registry URL are correct
Ensure Kubernetes has credentials to access private registry
For ECR: Verify ECR image pull secret is configured
Check network connectivity from cluster to registry

”CrashLoopBackOff” in Kubernetes

Cause: Container starts but immediately crashes Solutions:

Check application logs: kubectl logs <pod-name>
Verify environment variables are set correctly
Ensure required secrets and config maps exist
Check application dependencies (database, APIs) are accessible
Review resource limits aren’t too restrictive

”Workflow dispatch failed”

Cause: Cannot trigger GitHub Actions workflow Solutions:

Verify workflow file exists in repository
Check workflow_dispatch trigger is configured
Ensure you have permission to trigger workflows
Review workflow inputs match expected parameters
Check GitHub Actions is enabled for repository

Getting Help

If you’re still experiencing issues:

Check GitHub Actions logs - Detailed error messages and stack traces
Review Skyhook documentation - Additional guides and examples
Verify configuration - Double-check .koala.toml and secrets
Test components individually - Isolate the failing step
Contact support - Provide workflow run URL and error details

Next Steps

Using Workflows - Learn deployment operations
Configuration & Setup - Review setup steps
GitOps with ArgoCD - Troubleshoot ArgoCD issues

Introduction

Getting Started

Application Features

Infrastructure & Platform

Observability

Settings

AI Assistant

Terms and Privacy Policy

Troubleshooting

Build Failures

Docker Build Fails

Image Push Fails

Deployment Failures

kubectl Deployment Fails

ArgoCD Not Syncing

Authentication Issues

AWS Authentication Fails

GCP Authentication Fails

GitHub Authentication Fails

Common Error Messages

”ImagePullBackOff” in Kubernetes

”CrashLoopBackOff” in Kubernetes

”Workflow dispatch failed”

Getting Help

Next Steps

Introduction

Getting Started

Application Features

Infrastructure & Platform

Observability

Settings

AI Assistant

Terms and Privacy Policy

​Build Failures

​Docker Build Fails

​Image Push Fails

​Deployment Failures

​kubectl Deployment Fails

​ArgoCD Not Syncing

​Authentication Issues

​AWS Authentication Fails

​GCP Authentication Fails

​GitHub Authentication Fails

​Common Error Messages

​”ImagePullBackOff” in Kubernetes

​”CrashLoopBackOff” in Kubernetes

​”Workflow dispatch failed”

​Getting Help

​Next Steps

Build Failures

Docker Build Fails

Image Push Fails

Deployment Failures

kubectl Deployment Fails

ArgoCD Not Syncing

Authentication Issues

AWS Authentication Fails

GCP Authentication Fails

GitHub Authentication Fails

Common Error Messages

”ImagePullBackOff” in Kubernetes

”CrashLoopBackOff” in Kubernetes

”Workflow dispatch failed”

Getting Help

Next Steps