Debugging and General tips (Day 10)
Common debugging patterns and tips.
After setting up and debugging various parts, I thought I'd share some basic tips that have helped me along the way.
Managing Multiple Clusters
Here's how to merge multiple kubeconfig files:
KUBECONFIG=~/.kube/config:~/.kube/config.cluster2 kubectl config view --flatten > ~/.kube/config.merged
cp ~/.kube/config ~/.kube/config.backup
mv ~/.kube/config.merged ~/.kube/configYou can then rename contexts for better clarity:
kubectl config rename-context default prism
kubectl config rename-context kubernetes-admin@kubernetes atlasAnd set proper permissions on your kube config:
chmod 600 ~/.kube/configNode Scheduling Issues
If pods aren't scheduling on control plane nodes (I'm using 3 control plane nodes), check for taints:
kubectl get nodes -o json | jq '.items[].spec.taints'To remove control-plane taints if needed:
kubectl taint nodes --all node-role.kubernetes.io/control-plane-Troubleshooting Tips
In general, most issues can be found and solved by following a pattern:
- Get the resource
- Describe it
- And follow the trail of related resources
- Check the related logs
An example of a certificate issue:
Certificate Issues
Follow the chain of resources when debugging cert-manager:
kubectl get certificate -n argocd
kubectl -n argocd describe certificate argocd-certificate
kubectl -n argocd describe certificaterequests.cert-manager.io argocd-certificate-1
kubectl -n argocd describe order argocd-certificate-1-1494176820
kubectl -n cert-manager logs pods/cert-manager-<some-hash>Other times just deleting a resource and having it get recreated solves the issue, for example, switching from staging to production Let's Encrypt, you may need to delete the old secrets or the orders and they should be recreated:
e,g kubectl -n argocd delete secrets argocd-tlsNetwork Debugging
When services aren't reachable:
- Check firewall rules and network policies between VLANs
- Use
digornslookupto verify DNS resolution - Verify LoadBalancer IP assignments
- Use
tcpdumpandnetstatfor network debugging:
# Check listening ports
netstat -tlpn
# Monitor ARP requests
tcpdump -i any -n arp LoadBalancer Configuration
If setting up a new cluster using kubeadm (not on the cloud) use Metalb or Cilium to give load balancer IP addresses.
If using Cilium, here's a sample configuration:
apiVersion: "cilium.io/v2alpha1"
kind: CiliumLoadBalancerIPPool
metadata:
name: "lb-pool"
spec:
blocks:
- cidr: "192.168.30.140/30"
---
apiVersion: "cilium.io/v2alpha1"
kind: CiliumL2AnnouncementPolicy
metadata:
name: cilium-l2-announce
spec:
externalIPs: true
loadBalancerIPs: true
interfaces:
- eth0
All services run through traefik so a few loadbalancer IPs are plenty.
Helm and Argo CD Debugging
Debug Argo CD applications, you can render out the chart:
helm template . -f values.yaml > rendered-app.yamlAnd for helmfile:
helmfile template > rendered.yaml