Solving the "Zombie Connection" Race Condition in Istio/Envoy [Linux]

Introduction: If you are running Sisense on Kubernetes with Istio (ASM), you may encounter intermittent 502 Bad Gateway or 503 Service Unavailable errors during JAQL queries, even when your pods appear healthy.

Step-by-Step Guide:

1. The Problem: Timeout Mismatch

By default, the Istio Ingress Gateway (Envoy) tries to keep TCP connections to backend pods alive for as long as possible (defaulting to 1 hour). However, the underlying GCP VPC network or the Sisense application pods often have much shorter idle timeouts (usually 60 seconds).

This discrepancy creates a race condition:

A connection sits idle between dashboard refreshes.
The Sisense pod or the GCP network closes the connection due to age.
At that exact microsecond, Envoy attempts to reuse that "established" connection to send a new JAQL request.
Because the other end is already closed, Envoy receives a TCP Reset (RST).

Log Examples

In your Istio Ingress Gateway logs, you will see the following flags:

503 UC: Upstream Connection termination.
101 DC: Downstream Disconnect.

2. The Solution: EnvoyFilter Alignment

To fix this, you must force Envoy to be more "aggressive" than the network. By setting Envoy’s idle_timeout to 45 seconds, Envoy will proactively kill its own idle connections before the GCP network or the Sisense pod has a chance to drop them silently.

Critical Implementation Detail

Many users attempt to apply the filter in the application namespace (e.g., sisense), but if sidecar injection is not enabled, that filter will do nothing. The filter must be applied where the Envoy proxy actually lives: the istio-system namespace.

The Fix

Apply this configuration to target the Ingress Gateway specifically:

apiVersion: networking.istio.io/v1alpha3

kind: EnvoyFilter

metadata:

name: sisense-idle-timeout

namespace: istio-system # Must be in the same namespace as the Ingress

spec:

workloadSelector:

labels:

app: istio-ingressgateway # Targets the Gateway proxy

configPatches:

- applyTo: CLUSTER

match:

context: GATEWAY

patch:

operation: MERGE

value:

common_http_protocol_options:

idle_timeout: 45s # Closes connection before the 60s VPC/App limit

Conclusion: This ensures that the Ingress Gateway the only Envoy proxy in a non-sidecar setup manages the connection lifecycle properly. By "hanging up" first, it prevents the use of "zombie" connections, eliminating the 503 UC resets.

Disclaimer: This post outlines a potential custom workaround for a specific use case or provides instructions regarding a specific task. The solution may not work in all scenarios or Sisense versions, so we strongly recommend testing it in your environment before deployment. If you need further assistance with this, please let us know.

Published 04-06-2026