# Self-Hosted Runner

## Quick reference

| Symptom                    | Cause                              | Fix                                                           |
| -------------------------- | ---------------------------------- | ------------------------------------------------------------- |
| Runner pods OOMKilled      | Insufficient memory limits         | Increase `bricksRunner.cm.resources.limits.memory`            |
| Tasks stuck in pending     | Slow polling interval              | Reduce `orchestrator.resyncPeriod` (default: 5s)              |
| Auth errors in runner logs | API rate limiting or expired token | Increase `orchestrator.resyncPeriod`; verify API connectivity |
| Storage errors             | Persistent volume full             | Increase `bbxStorageManager.storage.size`                     |
| Image pull failures        | Registry unreachable or wrong tag  | Verify image references and registry access                   |
| Helm upgrade fails         | Version mismatch or CRD conflict   | Check chart version compatibility                             |

***

## Runner pods OOMKilled

When a Bricks Runner pod exceeds its memory limit, Kubernetes kills it with an OOMKilled status. This typically happens with large Terraform state files or complex plans.

### How to diagnose

```bash
kubectl get pods -n <bdc-namespace> | grep -i oom
kubectl describe pod <pod-name> -n <bdc-namespace> | grep -A 5 "Last State"
```

### How to fix

Increase the memory limit in your `values.yaml`:

```yaml
bricksRunner:
  cm:
    resources:
      requests:
        memory: "1024Mi"
      limits:
        memory: "2048Mi"
```

Then upgrade the Helm release:

```bash
helm upgrade bdc \
  oci://europe-docker.pkg.dev/bbx-registry-prod/helm/bluebricks-deployments-controller \
  -f values.yaml
```

{% hint style="info" %}
For sizing guidelines by cluster size, see [Sizing and Tuning](https://docs.bluebricks.co/bluebricks-documentation/security/bluebricks-self-hosted-runner#sizing-and-tuning).
{% endhint %}

***

## Tasks stuck in pending

Tasks may remain in a pending state if BDC is not polling frequently enough or if the controller is overloaded.

### How to diagnose

```bash
kubectl logs -f deployment/bdc-bluebricks-deployments-controller -n <bdc-namespace>
```

Look for messages indicating polling intervals or task queue backups.

### How to fix

**Increase polling frequency** by reducing the resync period:

```yaml
command:
  args:
    - "start"
    - "operator"
    - "--controllers=tasks"
    - "--max-concurrent-reconciles"
    - "4"
```

Increasing `--max-concurrent-reconciles` allows BDC to process more tasks in parallel. See the [sizing guidelines](https://docs.bluebricks.co/bluebricks-documentation/security/bluebricks-self-hosted-runner#sizing-and-tuning) for recommended values.

***

## Auth errors in runner logs

Authentication errors in the runner logs usually indicate that the BDC cannot communicate with the Bluebricks API, or that requests are being rate-limited.

### How to diagnose

```bash
kubectl logs -f deployment/bdc-bluebricks-deployments-controller -n <bdc-namespace> | grep -i auth
```

### How to fix

1. **Verify API connectivity**: confirm the runner can reach `api.bluebricks.co` on port 443

```bash
kubectl exec -it deployment/bdc-bluebricks-deployments-controller -n <bdc-namespace> -- \
  curl -s -o /dev/null -w "%{http_code}" https://api.bluebricks.co/api/v1/health
```

2. **Rate limiting**: if you see rate-limit responses, increase the polling interval to reduce request volume
3. **Check credentials**: verify the BDC service account token or API key is valid

***

## Storage errors

Storage errors occur when the persistent volume used by the storage manager is full.

### How to diagnose

```bash
kubectl get pvc -n <bdc-namespace>
kubectl describe pvc <pvc-name> -n <bdc-namespace>
```

### How to fix

Increase the storage size in your `values.yaml`:

```yaml
bbxStorageManager:
  storage:
    size: "30Gi"  # Minimum 10Gi; use 100Gi+ for high-volume clusters
```

Then upgrade the Helm release:

```bash
helm upgrade bdc \
  oci://europe-docker.pkg.dev/bbx-registry-prod/helm/bluebricks-deployments-controller \
  -f values.yaml
```

{% hint style="warning" %}
PVC resizing depends on your storage class supporting volume expansion. Check your cluster's storage class configuration before upgrading.
{% endhint %}

***

## Image pull failures

If runner pods fail to start with `ImagePullBackOff` or `ErrImagePull`, the container images cannot be downloaded.

### Common causes

* **Registry unreachable**: the cluster cannot reach `ghcr.io/bluebricks-dev/`
* **Wrong image tag**: the specified version does not exist
* **Private registry without credentials**: if you mirror images to a private registry, pull secrets may be missing

### How to fix

1. Verify the image exists:

```bash
kubectl describe pod <pod-name> -n <bdc-namespace> | grep "Image:"
```

2. Test registry access from the cluster:

```bash
kubectl run test-pull --image=ghcr.io/bluebricks-dev/bricks:latest --restart=Never -n <bdc-namespace>
kubectl describe pod test-pull -n <bdc-namespace>
kubectl delete pod test-pull -n <bdc-namespace>
```

3. If using a private registry, configure image pull secrets in your `values.yaml`

***

## Helm upgrade fails

Helm upgrades can fail due to version incompatibilities, CRD conflicts, or invalid values.

### Common causes

* **CRD already exists**: a previous installation left CRDs that conflict with the new version
* **Invalid values.yaml**: a field was renamed or removed in the new chart version
* **Pending Helm release**: a previous upgrade was interrupted, leaving the release in a bad state

### How to fix

1. Check the current release status:

```bash
helm list -n <bdc-namespace>
helm history bdc -n <bdc-namespace>
```

2. If the release is stuck in a pending state:

```bash
helm rollback bdc <last-successful-revision> -n <bdc-namespace>
```

3. For CRD conflicts, check installed CRDs:

```bash
kubectl get crd | grep bluebricks
```

4. Review the chart's release notes for breaking changes before upgrading

{% hint style="info" %}
For the full list of configurable values, see the [BDC Helm Chart](https://docs.bluebricks.co/bluebricks-documentation/security/bluebricks-self-hosted-runner/bdc-helm-chart) reference.
{% endhint %}

***

## Sizing guidelines

| Cluster size         | Concurrent workers | Runner memory | Storage |
| -------------------- | ------------------ | ------------- | ------- |
| Small (< 10 nodes)   | 2-3                | 1024Mi        | 10Gi    |
| Medium (10-50 nodes) | 4-8                | 1024Mi        | 30Gi    |
| Large (50+ nodes)    | 8-16               | 2048Mi        | 100Gi+  |

***

## Still stuck?

1. Check BDC logs: `kubectl logs -f deployment/bdc-bluebricks-deployments-controller -n <bdc-namespace>`
2. Review the [Self-Hosted Runner](https://docs.bluebricks.co/bluebricks-documentation/security/bluebricks-self-hosted-runner) documentation for installation and configuration details
3. [Contact support](https://www.bluebricks.co/support) with your Helm chart version, `values.yaml`, and relevant pod logs
