Troubleshooting
Common issues and diagnostic steps.
PXE Boot Issues
Node doesn't get an IP
- Check DHCP is running on
nuc-00-01:ssh ${IP_PREFIX}.8 systemctl status dhcpd - Verify the MAC address in
dhcpd.confmatches the actual NIC MAC - Check that no other DHCP server is active on the network segment
iPXE menu doesn't appear (node boots from disk)
- Verify TFTP is serving
ipxe.efi:tftp ${IP_PREFIX}.8 -c get ipxe.efi /tmp/test.efi && echo OK - Check the DHCP
next-serverandfilenameoptions indhcpd.conf - Verify Apache is serving the iPXE menu:
curl http://${IP_PREFIX}.10/harvester/harvester/ipxe-menu
Harvester install stalls
- Check that all config YAML files are rendered (no
${VAR}placeholders remain):grep '\${' /srv/www/htdocs/harvester/harvester/config-create-nuc-01.yaml - Verify DNS resolves the Harvester VIP before second/third node joins
- Check Harvester installer logs on the console
DNS Issues
Cluster nodes can't resolve hostnames
- Verify BIND is running:
ssh ${IP_PREFIX}.8 systemctl status named - Test resolution from a cluster node:
dig @${IP_PREFIX}.8 rancher.${BASE_DOMAIN} - Check that
nuc-00-01is the first DNS server in/etc/resolv.confon each node
Rancher Manager Issues
Rancher UI unreachable
- Check the Keepalived VIP is up:
ping -c 1 ${IP_PREFIX}.193 - Check HAProxy is forwarding to Rancher VMs:
ssh ${IP_PREFIX}.93 systemctl status haproxy - Verify Rancher K3s VMs are running inside Harvester
cert-manager webhook failures
Wait 2–3 minutes after cert-manager install before running the Rancher Helm chart. The webhook needs time to become ready.
kubectl --kubeconfig ~/.kube/rancher.yaml -n cert-manager wait \
--for=condition=ready pod --selector=app.kubernetes.io/component=webhook \
--timeout=120s
Environment / Registry Issues
Image pull failures (Carbide/Enclave)
- Verify registry credentials are correct:
bash Scripts/modules/carbide/registry_auth.sh - Check Harvester registry mirror configuration:
kubectl --kubeconfig ~/.kube/harvester.yaml get configmap -n harvester-system - For Enclave: verify the Hauler store is populated and Harbor is running
envsubst leaves ${VAR} placeholders
Ensure the environment is sourced before running scripts:
source Scripts/env.sh
echo $BASE_DOMAIN # should print the domain, not "${BASE_DOMAIN}"
General Diagnostic Commands
# Check all pods across all namespaces
kubectl --kubeconfig ~/.kube/harvester.yaml get pods -A | grep -v Running
# Recent events (errors first)
kubectl --kubeconfig ~/.kube/harvester.yaml get events -A --sort-by='.lastTimestamp' | tail -20
# Harvester node logs
kubectl --kubeconfig ~/.kube/harvester.yaml logs -n harvester-system -l app=harvester --tail=50
# Check cluster etcd health
kubectl --kubeconfig ~/.kube/harvester.yaml -n kube-system exec -it \
$(kubectl --kubeconfig ~/.kube/harvester.yaml get pods -n kube-system -l component=etcd -o name | head -1) \
-- etcdctl endpoint health