Kubernetes DNS Linux Issue Caused by Missing br_netfilter kernel module
Kubernetes DNS Linux issue caused by missing br_netfilter kernel module
What You’ll Learn
In this course you will learn how to:
- How to troubleshoot DNS issues in Kubernetes
- How to check if the br_netfilter kernel module is installed on the node
- How to make sure Sisense checked during installation/update if the br_netfilter was loaded
Use Cases For Applying What You Will Learn
The Sisense kubernetes pods are failing to start and the DNS is failing with an error “reply from unexpected source”
Prerequisites
- Familiarity with Sisense UI
- Basic knowledge of Linux
- Basic knowledge of Kubernetes
Scenario - DNS failure
The customer has reported the problem with several pods failing to start. The pods were waiting for the rabbitmq service to initialize but it failed to start as well.
First hypothesis — DNS failure
We tested the failing rabbitmq pod and determined that the DNS was not working properly — rabbitmq could not connect to the kube-apiserver. When we exec’d into the rabbitmq pod and tried connecting to the kubernetes.default.svc.cluster.local mentioned in the rabbitmq config map, the DNS name resolution failed.
To test this hypothesis we replaced kubernetes.default.svc.cluster.local in the rabbitmq configmap with the IP of the kube-apiserver and the rabbitmq pod started!
Why was DNS failing?
That was the obvious next question. We used the Debugging DNS Resolution article from the official kubernetes documentation to test the DNS in the cluster. For example we utilized a modified pod definition file from the article above to spin up 3 dnsutils test pods on each of the nodes
apiVersion: v1
kind: Pod
metadata:
name: dnsutils
namespace: default
Spec:
nodeName: node1 # schedule pod to specific node
containers:
- name: dnsutils
image: registry.k8s.io/e2e-test-images/jessie-dnsutils:1.3
command:
- sleep
- "infinity"
imagePullPolicy: IfNotPresent
restartPolicy: Always
You can see that we added a nodeName parameter to assign the pod to the particular node.
Tracing the DNS failure
On each of the nodes, we exec’d into the dnsutils pod to check DNS connectivity using the following command.
root@test:/# nslookup kubernetes.default.svc.cluster.local
Server: 10.32.0.10
Address: 10.32.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.32.0.23
;; reply from unexpected source: 10.200.2.32#53, expected 10.32.0.10#53
The result of the command issued above was ‘reply from unexpected source’on the nodes where rabbitmq was failing. We checked the IPs and realized that the reply came from the CoreDNS pod directly and not from the kube-DNS ‘umbrella’ service. Why would it happen?
br_netfilter kernel module missing
We started searching the Internet for the exact error and found this article: Debugging Kubernetes Networking. Others had the same problem and performed a rigorous DNS troubleshooting in their cluster.
In the end the root cause was attributed to the absence of the br_netfilter kernel module on some of the nodes. In our case, after the nodes restarted, the Linux node's iptables did not correctly view bridged traffic because the br_netfilter kernel model was not running on 2 out of 3 nodes. We could check that by listing the content of the /proc/sys/net folder on the affected nodes:
root@worker-0:~# ll /proc/sys/net
total 0
dr-xr-xr-x 1 root root 0 Nov 14 11:42 ./
dr-xr-xr-x 1 root root 0 Nov 14 10:10 ../
dr-xr-xr-x 1 root root 0 Nov 14 11:42 core/
dr-xr-xr-x 1 root root 0 Nov 14 11:42 ipv4/
dr-xr-xr-x 1 root root 0 Nov 14 11:42 ipv6/
dr-xr-xr-x 1 root root 0 Nov 14 11:42 iw_cm/
dr-xr-xr-x 1 root root 0 Nov 14 11:42 netfilter/
-rw-r--r-- 1 root root 0 Nov 14 11:42 nf_conntrack_max
dr-xr-xr-x 1 root root 0 Nov 14 11:42 unix/
The bridge folder was missing.
As a result, the DNS service was failing because the response to DNS requests on the affected nodes came back from the core-dns pod directly instead of the kube-dns service the requests were sent to.
To resolve the problem we implicitly loaded the missing kernel model by running: sudo modprobe br_netfilter.
The bridge folder was then created and appeared in the /proc/sys/net directory:
root@worker-0:~# ll /proc/sys/net
total 0
dr-xr-xr-x 1 root root 0 Nov 14 11:42 ./
dr-xr-xr-x 1 root root 0 Nov 14 10:10 ../
dr-xr-xr-x 1 root root 0 Dec 21 17:13 bridge/
dr-xr-xr-x 1 root root 0 Nov 14 11:42 core/
dr-xr-xr-x 1 root root 0 Nov 14 11:42 ipv4/
dr-xr-xr-x 1 root root 0 Nov 14 11:42 ipv6/
dr-xr-xr-x 1 root root 0 Nov 14 11:42 iw_cm/
dr-xr-xr-x 1 root root 0 Nov 14 11:42 netfilter/
-rw-r--r-- 1 root root 0 Nov 14 11:42 nf_conntrack_max
dr-xr-xr-x 1 root root 0 Nov 14 11:42 unix/
Does Sisense check if the module is loaded?
Short answer — yes. Sisense performs checks during the installation if the br_netfilter kernel module is loaded along with some other modules in kubespray/roles/rke/defaults/main.yml. The check is triggered by the 'Enable kernel modules' task in kubespray/roles/rke/tasks/kernel.yml accordingly.
## Kernel Modules
default_kernel_modules:
- br_netfilter
- ip6_udp_tunnel
- ip_set
- ip_set_hash_ip
- ip_set_hash_net
- iptable_filter
- iptable_nat
- iptable_mangle
- iptable_raw
- nf_conntrack_netlink
- nf_conntrack
- nf_defrag_ipv4
- nf_nat
- nfnetlink
- udp_tunnel
- veth
- vxlan
- x_tables
- xt_addrtype
- xt_conntrack
- xt_comment
- xt_mark
- xt_multiport
- xt_nat
- xt_recent
- xt_set
- xt_statistic
- xt_tcpudp
To check if we passed that step we can look at the installation log file: sisense-ansible.log.
It should look like this:
2022-12-15 14:32:51,628 p=5296 u=ubuntu n=ansible | TASK [rke : Enable kernel modules] *********************************************************************************************************************************************************************************************************
2022-12-15 14:32:51,629 p=5296 u=ubuntu n=ansible | ok: [node143] => (item=br_netfilter)
2022-12-15 14:32:51,630 p=5296 u=ubuntu n=ansible | ok: [node44] => (item=br_netfilter)
2022-12-15 14:32:51,636 p=5296 u=ubuntu n=ansible | ok: [node208] => (item=br_netfilter)
(or 'changed' instead of 'ok' if it was disabled before and just enabled by the installer)
Summary
In conclusion we hope the information in this article proves helpful in the event you encounter ‘reply from unexpected source’ DNS error in your deployment.
References
Debugging DNS Resolution
Debugging Kubernetes Networking
What's bridge-netfilter?
KubeDNS not working inside of pod when its containers are on the same node with kube-dns containers - GitHub issue