Common Issues & Solutions
Quick reference for resolving frequently encountered problems with homelab-autoscaler.
🔍 Quick Diagnostics
Controller Not Starting
# Check if CRDs are installed
kubectl get crds | grep homecluster
# Check pod status
kubectl get pods -n homelab-autoscaler-system
# Check logs for errors
kubectl logs -n homelab-autoscaler-system deployment/homelab-autoscaler-controller-manager
Common Fixes:
- Run
make installto install CRDs - Ensure you're using the correct namespace:
homelab-autoscaler-system
gRPC Connection Issues
# Check if gRPC service is running
kubectl get svc -n homelab-autoscaler-system | grep grpc
# Test gRPC connectivity
kubectl run grpc-test --rm -it --image=nicolaka/netshoot -- \
grpc_cli call homelab-autoscaler-grpc:50051 NodeGroups ""
Common Fixes:
- Verify gRPC service is enabled in Helm values
- Check network policies if using in production
🛠️ Common Operations
Manual Node Power Control
# Power on a specific node
kubectl patch node.infra.homecluster.dev <node-name> \
--type='merge' -p='{"spec":{"powerState":"on"}}'
# Power off a specific node
kubectl patch node.infra.homecluster.dev <node-name> \
--type='merge' -p='{"spec":{"powerState":"off"}}'
# Force status refresh
kubectl annotate node.infra.homecluster.dev <node-name> \
homelab-autoscaler.dev/force-reconcile=$(date +%s)
Stuck Job Cleanup
# List all jobs in the namespace
kubectl get jobs -n homelab-autoscaler-system
# Delete failed jobs
kubectl delete jobs -n homelab-autoscaler-system --field-selector=status.successful=0
# Clean up completed jobs
kubectl delete jobs -n homelab-autoscaler-system --field-selector=status.successful=1
Resource Inspection
# Get detailed resource status
kubectl describe group <group-name> -n homelab-autoscaler-system
kubectl describe node.infra.homecluster.dev <node-name> -n homelab-autoscaler-system
# Check resource conditions
kubectl get group <group-name> -n homelab-autoscaler-system -o jsonpath='{.status.conditions}'
kubectl get node.infra.homecluster.dev <node-name> -n homelab-autoscaler-system -o jsonpath='{.status.conditions}'
⚡ Performance Issues
High Resource Usage
# Check controller resource usage
kubectl top pods -n homelab-autoscaler-system
# Monitor API server calls
kubectl get --raw /metrics | grep apiserver_request_total
Tuning Options:
- Adjust reconciliation intervals in Helm values
- Increase resource limits for controller
- Reduce log verbosity if debug logging is enabled
Slow Scaling Operations
# Check operation timing in logs
kubectl logs -n homelab-autoscaler-system deployment/homelab-autoscaler-controller-manager | grep -i "duration\|timeout"
# Monitor job execution times
kubectl get jobs -n homelab-autoscaler-system -o wide
Optimizations:
- Tune job timeouts in Node CRD specifications
- Optimize power operation scripts for faster execution
- Consider higher resource limits for power operation jobs
🐛 Debugging Specific Scenarios
Nodes Not Powering On/Off
- Check job status:
kubectl describe job <job-name> - Verify power operation scripts work manually
- Check node power state:
kubectl describe node.infra.homecluster.dev <node-name>
Autoscaling Not Triggering
- Verify resource utilization metrics
- Check group scaling thresholds
- Ensure gRPC server is responding to Cluster Autoscaler
Resource State Mismatch
- Run manual sync: Annotate resource with
homelab-autoscaler.dev/force-reconcile - Check Core controller logs for sync issues
- Verify Kubernetes node exists and is healthy
📋 Emergency Procedures
Complete System Reset
# Delete all custom resources (CAUTION: Destructive!)
kubectl delete groups --all -n homelab-autoscaler-system
kubectl delete nodes.infra.homecluster.dev --all -n homelab-autoscaler-system
# Restart controller
kubectl rollout restart deployment/homelab-autoscaler-controller-manager -n homelab-autoscaler-system
Database Corruption Recovery
# Export current state for backup
kubectl get groups -n homelab-autoscaler-system -o yaml > groups-backup.yaml
kubectl get nodes.infra.homecluster.dev -n homelab-autoscaler-system -o yaml > nodes-backup.yaml
# Restore from backup
kubectl apply -f groups-backup.yaml
kubectl apply -f nodes-backup.yaml
For more detailed debugging, see the Debugging Guide. Report persistent issues on GitHub Issues.