cancel
Showing results for 
Search instead for 
Did you mean: 
antonvolov
Sisense Team Member
Sisense Team Member

Kubernetes DNS Linux issue caused by missing br_netfilter kernel module

What You’ll Learn

In this course you will learn how to:

  • How to troubleshoot DNS issues in Kubernetes
  • How to check if the br_netfilter kernel module is installed on the node
  • How to make sure Sisense checked during installation/update if the br_netfilter was loaded

Use Cases For Applying What You Will Learn 

The Sisense kubernetes pods are failing to start and the DNS is failing with an error “reply from unexpected source

Prerequisites

  • Familiarity with Sisense UI
  • Basic knowledge of Linux
  • Basic knowledge of Kubernetes

Scenario - DNS failure

The customer has reported the problem with several pods failing to start. The pods were waiting for the rabbitmq service to initialize but it failed to start as well.

 

antonvolov_0-1672961865697.png

 

First hypothesis — DNS failure

We tested the failing rabbitmq pod and determined that the DNS was not working properly — rabbitmq could not connect to the kube-apiserver. When we exec’d into the  rabbitmq pod and tried connecting to the kubernetes.default.svc.cluster.local mentioned in the rabbitmq config map, the DNS name resolution failed.

 

antonvolov_1-1672961865601.png

 

To test this hypothesis we replaced kubernetes.default.svc.cluster.local in the rabbitmq configmap with the IP of the kube-apiserver and the rabbitmq pod started!

Why was DNS failing?

That was the obvious next question. We used the Debugging DNS Resolution article from the official kubernetes documentation to test the DNS in the cluster. For example we utilized a modified pod definition file from the article above to spin up 3 dnsutils test pods on each of the nodes

apiVersion: v1
kind: Pod
metadata:
  name: dnsutils
  namespace: default
Spec:
  nodeName: node1 # schedule pod to specific node
  containers:
  - name: dnsutils
    image: registry.k8s.io/e2e-test-images/jessie-dnsutils:1.3
    command:
      - sleep
      - "infinity"
    imagePullPolicy: IfNotPresent
 restartPolicy: Always

You can see that we added a nodeName parameter to assign the pod to the particular node.

Tracing the DNS failure 

On each of the nodes, we exec’d into the dnsutils pod to check DNS connectivity using the following command. 

root@test:/# nslookup kubernetes.default.svc.cluster.local
Server:  10.32.0.10
Address: 10.32.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.32.0.23
;; reply from unexpected source: 10.200.2.32#53, expected 10.32.0.10#53

The result of the command issued above was ‘reply from unexpected source’on the nodes where rabbitmq was failing. We checked the IPs and realized that the reply came from the CoreDNS pod directly and not from the kube-DNS ‘umbrella’ service. Why would it happen?

br_netfilter kernel module missing

We started searching the Internet for the exact error and found this article: Debugging Kubernetes Networking. Others had the same problem and performed a rigorous DNS troubleshooting in their cluster.

In the end the root cause was attributed to the absence of the br_netfilter kernel module on some of the nodes. In our case, after the nodes restarted, the  Linux node's iptables did not correctly view bridged traffic because the br_netfilter kernel model was not running on 2 out of 3 nodes. We could check that by listing the content of the /proc/sys/net folder on the affected nodes:

root@worker-0:~# ll /proc/sys/net
total 0
dr-xr-xr-x 1 root root 0 Nov 14 11:42 ./
dr-xr-xr-x 1 root root 0 Nov 14 10:10 ../
dr-xr-xr-x 1 root root 0 Nov 14 11:42 core/
dr-xr-xr-x 1 root root 0 Nov 14 11:42 ipv4/
dr-xr-xr-x 1 root root 0 Nov 14 11:42 ipv6/
dr-xr-xr-x 1 root root 0 Nov 14 11:42 iw_cm/
dr-xr-xr-x 1 root root 0 Nov 14 11:42 netfilter/
-rw-r--r-- 1 root root 0 Nov 14 11:42 nf_conntrack_max
dr-xr-xr-x 1 root root 0 Nov 14 11:42 unix/

The bridge folder was missing.

As a result, the DNS service was failing because the response to DNS requests on the affected nodes came back from the core-dns pod directly instead of the kube-dns service the requests were sent to. 


To resolve the problem we implicitly loaded the missing kernel model by running: sudo modprobe br_netfilter.

The bridge folder was then created and appeared in the /proc/sys/net directory:

root@worker-0:~# ll /proc/sys/net
total 0
dr-xr-xr-x 1 root root 0 Nov 14 11:42 ./
dr-xr-xr-x 1 root root 0 Nov 14 10:10 ../
dr-xr-xr-x 1 root root 0 Dec 21 17:13 bridge/
dr-xr-xr-x 1 root root 0 Nov 14 11:42 core/
dr-xr-xr-x 1 root root 0 Nov 14 11:42 ipv4/
dr-xr-xr-x 1 root root 0 Nov 14 11:42 ipv6/
dr-xr-xr-x 1 root root 0 Nov 14 11:42 iw_cm/
dr-xr-xr-x 1 root root 0 Nov 14 11:42 netfilter/
-rw-r--r-- 1 root root 0 Nov 14 11:42 nf_conntrack_max
dr-xr-xr-x 1 root root 0 Nov 14 11:42 unix/

Does Sisense check if the module is loaded?

Short answer — yes. Sisense performs checks during the installation if the br_netfilter kernel module is loaded along with some other modules in kubespray/roles/rke/defaults/main.yml. The check is triggered by the 'Enable kernel modules' task in kubespray/roles/rke/tasks/kernel.yml accordingly.

## Kernel Modules
default_kernel_modules:
  - br_netfilter
  - ip6_udp_tunnel
   - ip_set
   - ip_set_hash_ip
   - ip_set_hash_net
   - iptable_filter
   - iptable_nat
   - iptable_mangle
   - iptable_raw
   - nf_conntrack_netlink
   - nf_conntrack
   - nf_defrag_ipv4
   - nf_nat
   - nfnetlink
   - udp_tunnel
   - veth
   - vxlan
  - x_tables
  - xt_addrtype
  - xt_conntrack
  - xt_comment
  - xt_mark
  - xt_multiport
  - xt_nat
  - xt_recent
  - xt_set
  - xt_statistic
  - xt_tcpudp

To check if we passed that step we can look at the installation log file: sisense-ansible.log.

It should look like this: 

2022-12-15 14:32:51,628 p=5296 u=ubuntu n=ansible | TASK [rke : Enable kernel modules] *********************************************************************************************************************************************************************************************************
2022-12-15 14:32:51,629 p=5296 u=ubuntu n=ansible | ok: [node143] => (item=br_netfilter)
2022-12-15 14:32:51,630 p=5296 u=ubuntu n=ansible | ok: [node44] => (item=br_netfilter)
2022-12-15 14:32:51,636 p=5296 u=ubuntu n=ansible | ok: [node208] => (item=br_netfilter)

(or 'changed' instead of 'ok' if it was disabled before and just enabled by the installer)

Summary  

In conclusion we hope the information in this article proves helpful in the event you encounter ‘reply from unexpected source’ DNS error in your deployment. 

References 

Debugging DNS Resolution
Debugging Kubernetes Networking
What's bridge-netfilter?
KubeDNS not working inside of pod when its containers are on the same node with kube-dns containers ...

Rate this article:
Version history
Last update:
‎03-02-2023 10:14 AM
Updated by: