Skip to main content
This guide helps you diagnose and resolve common issues with Skyhook CI/CD workflows.

Build Failures

Docker Build Fails

Symptoms:
  • Build step fails in GitHub Actions
  • Error messages about Dockerfile syntax or missing files
  • Build context errors
Common causes and solutions:
  1. Invalid Dockerfile syntax
    Check your Dockerfile for:
    - Correct instruction format (FROM, RUN, COPY, etc.)
    - Proper line continuation with backslashes
    - Valid base image references
    
  2. Missing files or directories
    Ensure files referenced in COPY or ADD commands exist:
    - Check file paths are relative to build context
    - Verify .dockerignore isn't excluding required files
    - Confirm files are committed to Git
    
  3. Base image not found
    Verify base image exists and is accessible:
    - Check image name and tag are correct
    - Ensure you have access to private registries
    - Try pulling the image locally first
    
  4. Build context too large
    Reduce build context size:
    - Add node_modules, .git, etc. to .dockerignore
    - Remove unnecessary files from repository
    - Use multi-stage builds to minimize final image size
    
Debugging steps:
  1. Review the full build logs in GitHub Actions
  2. Try building locally: docker build -t test .
  3. Check recent changes to Dockerfile or dependencies
  4. Verify all required files are in the repository

Image Push Fails

Symptoms:
  • Build succeeds but push to registry fails
  • Authentication errors to container registry
  • “Repository does not exist” errors
Common causes and solutions:
  1. Invalid registry credentials
    For AWS ECR:
    - Verify AWS_DEPLOY_ROLE or AWS credentials are correct
    - Check IAM role has ecr:GetAuthorizationToken permission
    - Ensure role trust policy includes GitHub OIDC provider
    
    For GCP Artifact Registry:
    - Verify WIF_PROVIDER and WIF_SERVICE_ACCOUNT are correct
    - Check service account has Artifact Registry Writer role
    - Ensure Workload Identity binding is configured
    
    For Azure ACR:
    - Verify service principal credentials
    - Check service principal has AcrPush role
    
  2. Repository doesn’t exist
    AWS ECR: Repositories are created automatically
    - Verify IAM permissions include ecr:CreateRepository
    
    GCP/Azure: Create repository manually
    - GCP: gcloud artifacts repositories create
    - Azure: az acr repository create
    
  3. Registry URL is incorrect
    Check registry format in .koala.toml:
    - AWS: 123456789.dkr.ecr.us-east-1.amazonaws.com
    - GCP: us-docker.pkg.dev/project-id/repository
    - Azure: myregistry.azurecr.io
    
Debugging steps:
  1. Verify registry URL in workflow logs
  2. Test authentication manually with cloud CLI tools
  3. Check registry permissions in cloud console
  4. Review GitHub secrets configuration

Deployment Failures

kubectl Deployment Fails

Symptoms:
  • Deployment step fails after successful build
  • “connection refused” or “unauthorized” errors
  • kubectl commands timeout
Common causes and solutions:
  1. Invalid cluster credentials
    Verify cluster access:
    - Check cluster name and region are correct
    - Ensure credentials have cluster admin permissions
    - Test connection: kubectl get nodes
    
  2. Cluster name format incorrect
    Verify cluster format matches cloud provider:
    
    AWS: aws/123456789/us-east-1/my-cluster
    - Account ID, region, cluster name must be exact
    
    GCP: gcp/my-project/us-central1/my-cluster
    - Project ID, location, cluster name must match
    
    Azure: azure/subscription-id/eastus/my-cluster
    - Subscription ID, region, cluster name must match
    
  3. Insufficient permissions
    Ensure service account/role has Kubernetes permissions:
    - AWS: IAM role needs eks:DescribeCluster
    - GCP: Service account needs container.clusters.get
    - Azure: Service principal needs AKS Cluster User
    
  4. Invalid Kubernetes manifests
    Check manifest syntax:
    - Validate YAML syntax
    - Ensure apiVersion is correct for your cluster
    - Verify resource names follow Kubernetes naming rules
    - Test locally: kubectl apply --dry-run=client -f manifests/
    
Debugging steps:
  1. Review kubectl output in workflow logs
  2. Verify cluster exists and is accessible
  3. Check Kubernetes manifest syntax
  4. Test deployment locally with kubectl

ArgoCD Not Syncing

Symptoms:
  • Workflow completes but changes don’t appear in cluster
  • ArgoCD shows “OutOfSync” status
  • Application health degraded
Common causes and solutions:
  1. ArgoCD not configured correctly
    Verify ArgoCD setup:
    - Check ArgoCD is installed in cluster
    - Ensure Application resource exists
    - Verify repository is connected in ArgoCD
    - Check sync policy is configured
    
  2. Repository access issues
    Ensure ArgoCD can access deployment repository:
    - For private repos: Add deploy key or credentials in ArgoCD
    - Verify repository URL is correct
    - Check branch name matches ArgoCD Application spec
    
  3. Manifest path incorrect
    Check path configuration:
    - Verify path in ArgoCD Application matches repo structure
    - Ensure kustomization.yaml exists at specified path
    - Check environment overlay path is correct
    
  4. Sync policy prevents auto-sync
    Check ArgoCD Application sync settings:
    - Enable automated sync if desired
    - Check for required manual approval
    - Review sync options and prune settings
    
Debugging steps:
  1. Check ArgoCD UI for application status
  2. Review ArgoCD application logs: kubectl logs -n argocd <argocd-server-pod>
  3. Verify Git commits appear in deployment repository
  4. Manually trigger sync in ArgoCD UI
  5. Check ArgoCD application events: kubectl describe application -n argocd <app-name>

Authentication Issues

AWS Authentication Fails

Symptoms:
  • “Unable to locate credentials” error
  • “Access denied” when accessing EKS or ECR
  • OIDC token validation errors
Common causes and solutions:
  1. OIDC provider not configured
    Ensure OIDC provider exists in AWS IAM:
    - Provider URL: token.actions.githubusercontent.com
    - Audience: sts.amazonaws.com
    - Verify provider is active
    
  2. IAM role trust policy incorrect
    Check role trust policy includes GitHub:
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::ACCOUNT:oidc-provider/token.actions.githubusercontent.com"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "token.actions.githubusercontent.com:sub": "repo:ORG/REPO:ref:refs/heads/main"
        }
      }
    }
    
  3. Missing IAM permissions
    Verify role has required policies:
    - EKS access: eks:DescribeCluster, eks:ListClusters
    - ECR access: ecr:GetAuthorizationToken, ecr:BatchCheckLayerAvailability,
                  ecr:PutImage, ecr:InitiateLayerUpload, ecr:UploadLayerPart,
                  ecr:CompleteLayerUpload, ecr:CreateRepository
    
  4. AWS_DEPLOY_ROLE variable not set
    Configure in GitHub repository:
    - Go to Settings → Secrets and Variables → Actions
    - Add variable: AWS_DEPLOY_ROLE
    - Value: arn:aws:iam::123456789:role/github-actions-deploy
    
Debugging steps:
  1. Verify OIDC provider exists in IAM
  2. Check role ARN is correct in GitHub variable
  3. Review IAM role trust policy and permissions
  4. Test role assumption locally with AWS CLI

GCP Authentication Fails

Symptoms:
  • “Permission denied” errors
  • “Invalid JWT” or token validation errors
  • Cannot access GKE or Artifact Registry
Common causes and solutions:
  1. Workload Identity not configured
    Verify Workload Identity setup:
    - Pool exists: gcloud iam workload-identity-pools describe github-actions
    - Provider configured for GitHub OIDC
    - Attribute mapping includes repository
    
  2. Service account permissions missing
    Ensure service account has required roles:
    - GKE access: roles/container.developer
    - Artifact Registry: roles/artifactregistry.writer
    - Basic: roles/iam.serviceAccountTokenCreator
    
  3. Workload Identity binding incorrect
    Check service account IAM policy binding:
    gcloud iam service-accounts get-iam-policy \
      [email protected]
    
    Should include:
    - Role: roles/iam.workloadIdentityUser
    - Member: principalSet://iam.googleapis.com/projects/PROJECT_NUM/locations/global/workloadIdentityPools/github-actions/attribute.repository/ORG/REPO
    
  4. WIF variables not set correctly
    Configure in GitHub repository:
    - WIF_PROVIDER: projects/PROJECT_NUM/locations/global/workloadIdentityPools/github-actions/providers/github
    - WIF_SERVICE_ACCOUNT: [email protected]
    
Debugging steps:
  1. Verify Workload Identity pool and provider exist
  2. Check service account has necessary roles
  3. Review Workload Identity binding for repository
  4. Test authentication locally with gcloud

GitHub Authentication Fails

Symptoms:
  • Cannot access deployment repository
  • “Resource not accessible by integration” error
  • PAT or GitHub App authentication fails
Common causes and solutions:
  1. GitHub App not installed
    Verify GitHub App installation:
    - Check app is installed on target repository
    - Review app permissions include repository access
    - Ensure app has Contents: Read & Write permission
    
  2. GitHub App credentials incorrect
    Check GitHub secrets:
    - GH_APP_ID: Numeric app ID (not app name)
    - GH_APP_PK: Complete private key including headers
      -----BEGIN RSA PRIVATE KEY-----
      ...
      -----END RSA PRIVATE KEY-----
    
  3. PAT lacks required permissions
    If using Personal Access Token:
    - Ensure PAT has 'repo' scope
    - Check PAT hasn't expired
    - Verify user has access to deployment repository
    
  4. Cross-organization access
    For accessing repos in different organizations:
    - GitHub App must be installed in target org
    - PAT user must be member of target org
    - Check organization SSO requirements
    
Debugging steps:
  1. Verify GitHub App installation and permissions
  2. Check secret values are complete and correct
  3. Test repository access manually
  4. Review workflow logs for specific error messages

Common Error Messages

”ImagePullBackOff” in Kubernetes

Cause: Kubernetes cannot pull the Docker image Solutions:
  1. Verify image tag exists in registry
  2. Check image name and registry URL are correct
  3. Ensure Kubernetes has credentials to access private registry
  4. For ECR: Verify ECR image pull secret is configured
  5. Check network connectivity from cluster to registry

”CrashLoopBackOff” in Kubernetes

Cause: Container starts but immediately crashes Solutions:
  1. Check application logs: kubectl logs <pod-name>
  2. Verify environment variables are set correctly
  3. Ensure required secrets and config maps exist
  4. Check application dependencies (database, APIs) are accessible
  5. Review resource limits aren’t too restrictive

”Workflow dispatch failed”

Cause: Cannot trigger GitHub Actions workflow Solutions:
  1. Verify workflow file exists in repository
  2. Check workflow_dispatch trigger is configured
  3. Ensure you have permission to trigger workflows
  4. Review workflow inputs match expected parameters
  5. Check GitHub Actions is enabled for repository

Getting Help

If you’re still experiencing issues:
  1. Check GitHub Actions logs - Detailed error messages and stack traces
  2. Review Skyhook documentation - Additional guides and examples
  3. Verify configuration - Double-check .koala.toml and secrets
  4. Test components individually - Isolate the failing step
  5. Contact support - Provide workflow run URL and error details

Next Steps