If you read Part 1, you know the GPU Spot quota was pending and there wasn’t much to do but wait. Quota requests don’t move fast.

So naturally, my brain did what any SRE’s brain does — started looking for more money to cut. lol.

NAT Gateways. $0.045 per hour per gateway, plus $0.045 per GB processed. In a multi-AZ lab setup, that’s additional money being spent before a single inference request hits the cluster.

So I went digging.

The Idea: IPv6-Only EKS + Egress-Only Internet Gateway

AWS supports IPv6-only EKS clusters, and the economics are actually compelling. An Egress-Only Internet Gateway (EIGW) is free. Nodes and pods can reach the internet for image pulls, SSM, and the Kubernetes API, while unsolicited inbound traffic is blocked at the routing level. No NAT Gateway, no per-GB tax.

The trade-off is that nodes end up with public IPv6 addresses. For production, that’s a conversation. For a lab? Totally fine with tight security groups.

Here’s what the Terraform changes looked like:

# Free egress for IPv6 — replaces NAT Gateway
resource "aws_egress_only_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
  tags   = { Name = "smigtech-eks-eigw" }
}

resource "aws_route_table" "main" {
  vpc_id = aws_vpc.main.id

  # IPv4 still needed for some AWS endpoints
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }

  # IPv6 outbound via EIGW — no unsolicited inbound
  route {
    ipv6_cidr_block        = "::/0"
    egress_only_gateway_id = aws_egress_only_internet_gateway.main.id
  }
}

And on the cluster itself:

kubernetes_network_config {
  ip_family         = "ipv6"
  service_ipv6_cidr = null  # Let AWS assign from fd00::/8
}

One thing worth knowing: ip_family is set at cluster creation and can’t be changed afterward. Plan before you deploy.

Locking Down Nodes on Public Subnets

Nodes on public IPv6 addresses sound scarier than they are. Security groups do the heavy lifting here.

The key insight: with Envoy Gateway fronting external traffic via a Network Load Balancer, the NodePort range doesn’t need to be open to the world at all. In IP mode, the NLB talks directly to pod IPs, so nothing needs to reach the nodes from outside except the control plane.

Traffic flow looks like this:

Internet → NLB → Envoy Gateway pods → backend services

Minimum node security group rules:

DirectionPortSourcePurpose
Ingress443Cluster SGControl plane → kubelet
Ingress10250Cluster SGExec / metrics
IngressAllSelf (node SG)Inter-node, CNI, DNS
EgressAll0.0.0.0/0, ::/0Outbound (ECR, SSM)

No NodePort range. No broad ingress from the internet. Clean.

What Actually Works

To be clear, most of this setup works well. IPv6-only cluster with EIGW routing, Karpenter provisioning spot nodes, Envoy Gateway exposing services through an NLB. All of it came together. Nodes stayed reachable by the control plane, pods could pull images, and DNS resolved correctly.

For standard Kubernetes workloads, IPv6-only EKS is a legitimate cost play. The EIGW alone eliminates the NAT Gateway bill and is worth the setup effort.

Where It Falls Apart: KServe and uvicorn

Here’s where the rabbit hole hits a wall.

KServe’s model server starts two listeners when you deploy an InferenceService: a gRPC server and a REST server backed by uvicorn. On an IPv6-only cluster, pods only have IPv6 addresses. When kubelet runs its readiness and liveness checks, it hits the pod’s IPv6 address.

The gRPC server handles this correctly. You can see it in the logs:

Starting gRPC server on [::]:8081

The REST server does not:

Application startup complete.
Uvicorn running on http://0.0.0.0:8080

0.0.0.0 is the IPv4 wildcard. On a pod with no IPv4 stack, that socket can’t receive any traffic. Kubelet hits the IPv6 address, gets connection refused, marks the pod unhealthy, and it crashloops. Every time.

The problem lives in python/kserve/kserve/protocol/rest/server.py:

self.cfg = uvicorn.Config(
    app,
    host="0.0.0.0",  # hardcoded — this is the problem
    port=http_port,
    ...
)

The gRPC server already binds to [::]. The REST server doesn’t. The inconsistency is clear, and the fix is clear-cut — add a --listen_address arg to the model server’s argparser, default it to 0.0.0.0 for backward compat, and thread it through to uvicorn.Config.

I explored the workarounds:

  • UVICORN_HOST env var — doesn’t work. KServe passes host= explicitly to uvicorn.Config, and explicit kwargs override env vars.
  • socat sidecar — forwards IPv6 to the IPv4 uvicorn port. Works technically, but adds latency on an inference path. Hard no.
  • Init container monkey-patch — possible but brittle. Not something you want in a setup you’re trying to reproduce.

None of these actually worked.

The Verdict

The IPv6 optimization is sound. The EIGW approach works, the security posture is reasonable, and for most EKS workloads, this is worth doing. But KServe’s REST server has a gap that prevents it from working on IPv6-only clusters today.

This is a legitimate upstream issue — the fix is small, and the gRPC/REST inconsistency makes a clear case for it. If that lands, the full IPv6 path opens up.

For now, the cluster is back on a standard configuration. And the quota came through.

Part 3 picks up there — GPU Spot nodes via Karpenter, KServe serving a model, and OpenWebUI wired up as the chat interface. The actual goal.


Terraform and GitLab CI for this setup: gitlab.com/smiggiddy/eks