# Self-Hosted Runner

## Quick reference

| Symptom                    | Cause                              | Fix                                                 |
| -------------------------- | ---------------------------------- | --------------------------------------------------- |
| Runner pods OOMKilled      | Insufficient memory limits         | Increase `bricksRunner.cm.resources.limits.memory`  |
| Tasks stuck in pending     | Controller processing bottleneck   | Increase `--max-concurrent-reconciles` (default: 3) |
| Auth errors in runner logs | API rate limiting or expired token | Verify API connectivity; check credentials          |
| Storage errors             | Persistent volume full             | Increase `bbxStorageManager.storage.size`           |
| Image pull failures        | Registry unreachable or wrong tag  | Verify image references and registry access         |
| Helm upgrade fails         | Version mismatch or CRD conflict   | Check chart version compatibility                   |

***

## Runner pods OOMKilled

When a Bricks Runner pod exceeds its memory limit, Kubernetes kills it with an OOMKilled status. This typically happens with large Terraform state files or complex plans.

### How to diagnose

```bash
kubectl get pods -n <bdc-namespace> | grep -i oom
kubectl describe pod <pod-name> -n <bdc-namespace> | grep -A 5 "Last State"
```

### How to fix

Increase the memory limit in your `values.yaml`:

```yaml
bricksRunner:
  cm:
    resources:
      requests:
        memory: "1024Mi"
      limits:
        memory: "2048Mi"
```

Then upgrade the Helm release:

```bash
helm upgrade bdc \
  oci://europe-docker.pkg.dev/bbx-registry-prod/helm/bluebricks-deployments-controller \
  -f values.yaml
```

{% hint style="info" %}
For sizing guidelines by cluster size, see [Sizing and Tuning](https://docs.bluebricks.co/bluebricks-documentation/security/bluebricks-self-hosted-runner#sizing-and-tuning).
{% endhint %}

***

## Tasks stuck in pending

Tasks may remain in a pending state if the controller is overloaded and cannot process the task queue fast enough.

### How to diagnose

```bash
kubectl logs -f -l app.kubernetes.io/name=bluebricks-deployments-controller -n <bdc-namespace>
```

Look for messages indicating task queue backups or reconciliation delays.

### How to fix

**Increase parallelism** by raising `--max-concurrent-reconciles` in your `values.yaml`. The default is `3`:

```yaml
command:
  args:
    - "start"
    - "operator"
    - "--controllers=all"
    - "--max-concurrent-reconciles"
    - "6"
```

This allows BDC to process more tasks in parallel. See the [sizing guidelines](https://docs.bluebricks.co/bluebricks-documentation/security/bluebricks-self-hosted-runner#sizing-and-tuning) for recommended values.

***

## Auth errors in runner logs

Authentication errors in the runner logs usually indicate that the BDC cannot communicate with the Bluebricks API, or that requests are being rate-limited.

### How to diagnose

```bash
kubectl logs -f -l app.kubernetes.io/name=bluebricks-deployments-controller -n <bdc-namespace> | grep -i auth
```

### How to fix

1. **Verify API connectivity**: confirm the runner can reach `api.bluebricks.co` on port 443

```bash
kubectl exec -it -l app.kubernetes.io/name=bluebricks-deployments-controller -n <bdc-namespace> -- \
  curl -s -o /dev/null -w "%{http_code}" https://api.bluebricks.co/health/isalive
```

2. **Rate limiting**: if you see rate-limit responses, increase the polling interval to reduce request volume
3. **Check credentials**: verify the BDC service account token or API key is valid

***

## Storage errors

Storage errors occur when the persistent volume used by the storage manager is full.

### How to diagnose

```bash
kubectl get pvc -n <bdc-namespace>
kubectl describe pvc <pvc-name> -n <bdc-namespace>
```

### How to fix

Increase the storage size in your `values.yaml`:

```yaml
bbxStorageManager:
  storage:
    size: "30Gi"  # Minimum 10Gi; use 100Gi+ for high-volume clusters
```

Then upgrade the Helm release:

```bash
helm upgrade bdc \
  oci://europe-docker.pkg.dev/bbx-registry-prod/helm/bluebricks-deployments-controller \
  -f values.yaml
```

{% hint style="warning" %}
PVC resizing depends on your storage class supporting volume expansion. Check your cluster's storage class configuration before upgrading.
{% endhint %}

***

## Image pull failures

If runner pods fail to start with `ImagePullBackOff` or `ErrImagePull`, the container images cannot be downloaded.

### Common causes

* **Registry unreachable**: the cluster cannot reach the image registry. The default Helm chart uses `europe-docker.pkg.dev/bbx-registry-prod/public-oci/`
* **Wrong image tag**: the specified version does not exist
* **Private registry without credentials**: if you mirror images to a private registry, pull secrets may be missing

### How to fix

1. Check which image the pod is trying to pull:

```bash
kubectl describe pod <pod-name> -n <bdc-namespace> | grep "Image:"
```

BDC uses two different images: the **controller image** (`bdctl`) that runs the operator, and the **runner image** (`bricks`) that executes IaC operations. Identify which image is failing to pull.

3. If using a private registry, configure image pull secrets in your `values.yaml`

***

## Helm upgrade fails

Helm upgrades can fail due to version incompatibilities, CRD conflicts, or invalid values.

### Common causes

* **CRD already exists**: a previous installation left CRDs that conflict with the new version
* **Invalid values.yaml**: a field was renamed or removed in the new chart version
* **Pending Helm release**: a previous upgrade was interrupted, leaving the release in a bad state

### How to fix

1. Check the current release status:

```bash
helm list -n <bdc-namespace>
helm history bdc -n <bdc-namespace>
```

2. If the release is stuck in a pending state:

```bash
helm rollback bdc <last-successful-revision> -n <bdc-namespace>
```

3. For CRD conflicts, check installed CRDs:

```bash
kubectl get crd | grep bluebricks
```

4. Review the chart's release notes for breaking changes before upgrading

{% hint style="info" %}
For the full list of configurable values, see the [BDC Helm Chart](https://docs.bluebricks.co/bluebricks-documentation/security/bluebricks-self-hosted-runner/bdc-helm-chart) reference.
{% endhint %}

***

## Sizing guidelines

| Cluster size         | Concurrent workers | Runner memory | Storage |
| -------------------- | ------------------ | ------------- | ------- |
| Small (< 10 nodes)   | 2-3                | 2048Mi        | 10Gi    |
| Medium (10-50 nodes) | 4-8                | 2048Mi        | 30Gi    |
| Large (50+ nodes)    | 8-16               | 4096Mi        | 100Gi+  |

{% hint style="warning" %}
The Helm chart default runner memory is 2048Mi. Terraform operations require a minimum of 2048Mi. Do not set runner memory below this value.
{% endhint %}

***

## Need more help?

1. Check BDC logs: `kubectl logs -f -l app.kubernetes.io/name=bluebricks-deployments-controller -n <bdc-namespace>`
2. Review the [Self-Hosted Runner](https://docs.bluebricks.co/bluebricks-documentation/security/bluebricks-self-hosted-runner) documentation for installation and configuration details
3. [Contact support](https://www.bluebricks.co/support) with your Helm chart version, `values.yaml`, and relevant pod logs


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://bluebricks.co/docs/help/troubleshooting/self-hosted-runner.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
