> For the complete documentation index, see [llms.txt](https://bluebricks.co/docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://bluebricks.co/docs/help/troubleshooting/self-hosted-runner.md).

# Self-Hosted Runner

## Quick reference

| Symptom                    | Cause                              | Fix                                                 |
| -------------------------- | ---------------------------------- | --------------------------------------------------- |
| Runner pods OOMKilled      | Insufficient memory limits         | Increase `bricksRunner.cm.resources.limits.memory`  |
| Tasks stuck in pending     | Controller processing bottleneck   | Increase `--max-concurrent-reconciles` (default: 3) |
| Auth errors in runner logs | API rate limiting or expired token | Verify API connectivity; check credentials          |
| Storage errors             | Persistent volume full             | Increase `bbxStorageManager.storage.size`           |
| Image pull failures        | Registry unreachable or wrong tag  | Verify image references and registry access         |
| Helm upgrade fails         | Version mismatch or CRD conflict   | Check chart version compatibility                   |

***

## Runner pods OOMKilled

When a Bricks Runner pod exceeds its memory limit, Kubernetes kills it with an OOMKilled status. This typically happens with large Terraform state files or complex plans.

### How to diagnose

```bash
kubectl get pods -n <bdc-namespace> | grep -i oom
kubectl describe pod <pod-name> -n <bdc-namespace> | grep -A 5 "Last State"
```

### How to fix

Increase the memory limit in your `values.yaml`:

```yaml
bricksRunner:
  cm:
    resources:
      requests:
        memory: "1024Mi"
      limits:
        memory: "2048Mi"
```

Then upgrade the Helm release:

```bash
helm upgrade bdc \
  oci://europe-docker.pkg.dev/bbx-registry-prod/helm/bluebricks-deployments-controller \
  -f values.yaml
```

{% hint style="info" %}
For sizing guidelines by cluster size, see [Sizing and Tuning](https://docs.bluebricks.co/bluebricks-documentation/security/bluebricks-self-hosted-runner#sizing-and-tuning).
{% endhint %}

***

## Tasks stuck in pending

Tasks may remain in a pending state if the controller is overloaded and cannot process the task queue fast enough.

### How to diagnose

```bash
kubectl logs -f -l app.kubernetes.io/name=bluebricks-deployments-controller -n <bdc-namespace>
```

Look for messages indicating task queue backups or reconciliation delays.

### How to fix

**Increase parallelism** by raising `--max-concurrent-reconciles` in your `values.yaml`. The default is `3`:

```yaml
command:
  args:
    - "start"
    - "operator"
    - "--controllers=all"
    - "--max-concurrent-reconciles"
    - "6"
```

This allows BDC to process more tasks in parallel. See the [sizing guidelines](https://docs.bluebricks.co/bluebricks-documentation/security/bluebricks-self-hosted-runner#sizing-and-tuning) for recommended values.

***

## Auth errors in runner logs

Authentication errors in the runner logs usually indicate that the BDC cannot communicate with the Bluebricks API, or that requests are being rate-limited.

### How to diagnose

```bash
kubectl logs -f -l app.kubernetes.io/name=bluebricks-deployments-controller -n <bdc-namespace> | grep -i auth
```

### How to fix

1. **Verify API connectivity**: confirm the runner can reach `api.bluebricks.co` on port 443

```bash
kubectl exec -it -l app.kubernetes.io/name=bluebricks-deployments-controller -n <bdc-namespace> -- \
  curl -s -o /dev/null -w "%{http_code}" https://api.bluebricks.co/health/isalive
```

2. **Rate limiting**: if you see rate-limit responses, increase the polling interval to reduce request volume
3. **Check credentials**: verify the BDC service account token or API key is valid

***

## Storage errors

Storage errors occur when the persistent volume used by the storage manager is full.

### How to diagnose

```bash
kubectl get pvc -n <bdc-namespace>
kubectl describe pvc <pvc-name> -n <bdc-namespace>
```

### How to fix

Increase the storage size in your `values.yaml`:

```yaml
bbxStorageManager:
  storage:
    size: "30Gi"  # Minimum 10Gi; use 100Gi+ for high-volume clusters
```

Then upgrade the Helm release:

```bash
helm upgrade bdc \
  oci://europe-docker.pkg.dev/bbx-registry-prod/helm/bluebricks-deployments-controller \
  -f values.yaml
```

{% hint style="warning" %}
PVC resizing depends on your storage class supporting volume expansion. Check your cluster's storage class configuration before upgrading.
{% endhint %}

***

## Image pull failures

If runner pods fail to start with `ImagePullBackOff` or `ErrImagePull`, the container images cannot be downloaded.

### Common causes

* **Registry unreachable**: the cluster cannot reach the image registry. The default Helm chart uses `europe-docker.pkg.dev/bbx-registry-prod/public-oci/`
* **Wrong image tag**: the specified version does not exist
* **Private registry without credentials**: if you mirror images to a private registry, pull secrets may be missing

### How to fix

1. Check which image the pod is trying to pull:

```bash
kubectl describe pod <pod-name> -n <bdc-namespace> | grep "Image:"
```

BDC uses two different images: the **controller image** (`bdctl`) that runs the operator, and the **runner image** (`bricks`) that executes IaC operations. Identify which image is failing to pull.

3. If using a private registry, configure image pull secrets in your `values.yaml`

***

## Helm upgrade fails

Helm upgrades can fail due to version incompatibilities, CRD conflicts, or invalid values.

### Common causes

* **CRD already exists**: a previous installation left CRDs that conflict with the new version
* **Invalid values.yaml**: a field was renamed or removed in the new chart version
* **Pending Helm release**: a previous upgrade was interrupted, leaving the release in a bad state

### How to fix

1. Check the current release status:

```bash
helm list -n <bdc-namespace>
helm history bdc -n <bdc-namespace>
```

2. If the release is stuck in a pending state:

```bash
helm rollback bdc <last-successful-revision> -n <bdc-namespace>
```

3. For CRD conflicts, check installed CRDs:

```bash
kubectl get crd | grep bluebricks
```

4. Review the chart's release notes for breaking changes before upgrading

{% hint style="info" %}
For the full list of configurable values, see the [BDC Helm Chart](https://docs.bluebricks.co/bluebricks-documentation/security/bluebricks-self-hosted-runner/bdc-helm-chart) reference.
{% endhint %}

***

## Sizing guidelines

| Cluster size         | Concurrent workers | Runner memory | Storage |
| -------------------- | ------------------ | ------------- | ------- |
| Small (< 10 nodes)   | 2-3                | 2048Mi        | 10Gi    |
| Medium (10-50 nodes) | 4-8                | 2048Mi        | 30Gi    |
| Large (50+ nodes)    | 8-16               | 4096Mi        | 100Gi+  |

{% hint style="warning" %}
The Helm chart default runner memory is 2048Mi. Terraform operations require a minimum of 2048Mi. Do not set runner memory below this value.
{% endhint %}

***

## Need more help?

1. Check BDC logs: `kubectl logs -f -l app.kubernetes.io/name=bluebricks-deployments-controller -n <bdc-namespace>`
2. Review the [Self-Hosted Runner](https://docs.bluebricks.co/bluebricks-documentation/security/bluebricks-self-hosted-runner) documentation for installation and configuration details
3. [Contact support](https://www.bluebricks.co/support) with your Helm chart version, `values.yaml`, and relevant pod logs