Skip to content

Latest commit

 

History

History
332 lines (296 loc) · 12.5 KB

TP16.md

File metadata and controls

332 lines (296 loc) · 12.5 KB

eBPF Network Deep Dive

Ce TP se déroule sur un cluster DigitalOcean.

Sommaire

But du TP

Comprendre le routage et la NAT avec Cilium dans un setup particulier. Plus de détails sur http://arthurchiao.art/blog/cilium-life-of-a-packet-pod-to-service/

Vu du Pod

Connectez vous sur un Pod.

Lancer un ping extérieur (par exemple sur 1.1)

Observez la table de routage du Pod :

#route -n 

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.244.0.201    0.0.0.0         UG    0      0        0 eth0
10.244.0.201    0.0.0.0         255.255.255.255 UH    0      0        0 eth0
# arp -an
? (10.244.0.201) at f6:93:7d:11:4d:37 [ether]  on eth0

On constate que la gateway du Pod est 10.244.0.201 @MAC=f6:93:7d:11:4d:37

#ip a
24: eth0@if25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether a6:c6:93:71:26:19 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.244.0.213/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::a4c6:93ff:fe71:2619/64 scope link 
       valid_lft forever preferred_lft forever

Vu du Node

Par ailleurs, connectez-vous sur le Node (avec kubectl node-shell), puis faites un

ip a | grep -A1 -B1 f6:93:7d:11:4d:37

25: lxcf4cd39e34601@if24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether f6:93:7d:11:4d:37 brd ff:ff:ff:ff:ff:ff link-netns cni-17945243-a306-236e-f504-5197df7b88d9
    inet6 fe80::f493:7dff:fe11:4d37/64 scope link

On voit que l'@MAC de la gw du Pod est l'interface lxcf4cd39e34601@if24, qui est donc l'extremité du veth du Pod.

L'IP de la gateway est portée par l'interface cilium_host :

# ifconfig cilium_host
cilium_host: flags=4291<UP,BROADCAST,RUNNING,NOARP,MULTICAST>  mtu 1500
        inet 10.244.0.201  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::a0a3:2ff:fe88:734  prefixlen 64  scopeid 0x20<link>
        ether a2:a3:02:88:07:34  txqueuelen 1000  (Ethernet)
        RX packets 286  bytes 20100 (19.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 4642  bytes 302532 (295.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Un programe BPF y est attaché :

  tc filter show dev lxcf4cd39e34601  ingress

On obtient

filter protocol all pref 1 bpf chain 0 
filter protocol all pref 1 bpf chain 0 handle 0x1 bpf_lxc.o:[from-container] direct-action not_in_hw id 13245 tag 7febdcdfe7cdd86f jited 

Routage sur le Node

Le Node route ensuite le paquet suivant sa table de routage :

   route -n

On obtient :

Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         165.22.16.1     0.0.0.0         UG    0      0        0 eth0
10.19.0.0       0.0.0.0         255.255.0.0     U     0      0        0 eth0
10.135.0.0      0.0.0.0         255.255.0.0     U     0      0        0 eth1
10.244.0.0      10.135.152.71   255.255.255.128 UG    0      0        0 eth1
10.244.0.128    10.244.0.148    255.255.255.128 UG    0      0        0 cilium_host
10.244.0.148    0.0.0.0         255.255.255.255 UH    0      0        0 cilium_host
165.22.16.0     0.0.0.0         255.255.240.0   U     0      0        0 eth0
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0

Le paquet sort par eth0. Regardons si un programme BPF est attaché :

tc filter show dev eth0 egress

Notons qu'il y a de nombreuses règles IPTables sur le Node :

# iptables-save | grep -c KUBE
138

Vu depuis le Cilium-agent

Connectez-vous sur le pod Cilium du Node en question :

kubectl exec -ti cilium-r7ffp -nkube-system -- bash

Si on fait un wget http://1.1 dans le Pod original, on peut afficher la table de suivi eBPF :

# cilium bpf ct list global | grep 1.0.0.1
TCP OUT 10.244.0.213:46486 -> 1.0.0.1:80 expires=17755044 RxPackets=4 RxBytes=614 RxFlagsSeen=0x1b LastRxReport=17755035 TxPackets=6 TxBytes=415 TxFlagsSeen=0x1b LastTxReport=17755035 Flags=0x0013 [ RxClosing TxClosing SeenNonSyn ] RevNAT=0 SourceSecurityID=42837 IfIndex=0 
ICMP OUT 10.244.0.213:51724 -> 1.0.0.1:0 expires=17753923 RxPackets=2 RxBytes=196 RxFlagsSeen=0x00 LastRxReport=17753864 TxPackets=2 TxBytes=196 TxFlagsSeen=0x00 LastTxReport=17753864 Flags=0x0000 [ ] RevNAT=0 SourceSecurityID=42837 IfIndex=0 
ICMP OUT 10.244.0.213:0 -> 1.0.0.1:0 related expires=17755093 RxPackets=0 RxBytes=0 RxFlagsSeen=0x00 LastRxReport=0 TxPackets=1 TxBytes=74 TxFlagsSeen=0x02 LastTxReport=17755035 Flags=0x0010 [ SeenNonSyn ] RevNAT=0 SourceSecurityID=42837 IfIndex=0 
ICMP OUT 10.244.0.213:41227 -> 1.0.0.1:0 expires=17754874 RxPackets=104 RxBytes=10192 RxFlagsSeen=0x00 LastRxReport=17754815 TxPackets=104 TxBytes=10192 TxFlagsSeen=0x00 LastTxReport=17754815 Flags=0x0000 [ ] RevNAT=0 SourceSecurityID=42837 IfIndex=0 

On peut également voir les logs grâce à cilium monitor :

# cilium monitor | grep 1.0.0.1
level=info msg="Initializing dissection cache..." subsys=monitor
-> stack flow 0x8958dbda identity 42837->world state new ifindex 0 orig-ip 0.0.0.0: 10.244.0.213:59242 -> 1.0.0.1:80 tcp SYN
-> endpoint 762 flow 0x0 identity world->42837 state reply ifindex lxcf4cd39e34601 orig-ip 1.0.0.1: 1.0.0.1:80 -> 10.244.0.213:59242 tcp SYN, ACK
-> stack flow 0x8958dbda identity 42837->world state established ifindex 0 orig-ip 0.0.0.0: 10.244.0.213:59242 -> 1.0.0.1:80 tcp ACK
-> stack flow 0x8958dbda identity 42837->world state established ifindex 0 orig-ip 0.0.0.0: 10.244.0.213:59242 -> 1.0.0.1:80 tcp ACK
-> endpoint 762 flow 0x0 identity world->42837 state reply ifindex lxcf4cd39e34601 orig-ip 1.0.0.1: 1.0.0.1:80 -> 10.244.0.213:59242 tcp ACK
-> stack flow 0x8958dbda identity 42837->world state established ifindex 0 orig-ip 0.0.0.0: 10.244.0.213:59242 -> 1.0.0.1:80 tcp ACK, FIN
-> endpoint 762 flow 0x0 identity world->42837 state reply ifindex lxcf4cd39e34601 orig-ip 1.0.0.1: 1.0.0.1:80 -> 10.244.0.213:59242 tcp ACK, FIN
-> stack flow 0x8958dbda identity 42837->world state established ifindex 0 orig-ip 0.0.0.0: 10.244.0.213:59242 -> 1.0.0.1:80 tcp ACK

Dans le setup de ce cluster la NAT n'est pas gérée par Cilium :

# cilium bpf nat list
Unable to open /sys/fs/bpf/tc/globals/cilium_snat_v4_external: Unable to get object /sys/fs/bpf/tc/globals/cilium_snat_v4_external: no such file or directory. Skipping.
Unable to open /sys/fs/bpf/tc/globals/cilium_snat_v6_external: Unable to get object /sys/fs/bpf/tc/globals/cilium_snat_v6_external: no such file or directory. Skipping.

Et effectivement depuis votre PC ou Gipod :

% cilium config view | grep -i masq
enable-bpf-masquerade                  false
masquerade                             true

On peut s'en assurer en regardant la table conntrack :

root@node-7o7xd:/# grep 1.0.0.1 /proc/net/nf_conntrack 
ipv4     2 tcp      6 117 TIME_WAIT src=10.244.0.213 dst=1.0.0.1 sport=41708 dport=80 src=1.0.0.1 dst=164.92.138.123 sport=80 dport=41708 [ASSURED] mark=0 zone=0 use=2

On peut inspecter :

  • les variables d'environnement du pod
kubectl describe pod/cilium-zpvw5 -n kube-system
[..]
   Command:
      cilium-agent
    Args:
      --config-dir=/tmp/cilium/config-map
      --k8s-api-server=https://2e37bf88-3149-4fb7-8141-670f20da6700.k8s.ondigitalocean.com
      --ipv4-native-routing-cidr=10.244.0.0/16
    State:          Running
      Started:      Wed, 11 Sep 2024 09:05:20 +0200
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:      300m
      memory:   300Mi
    Liveness:   http-get http://127.0.0.1:9879/healthz delay=120s timeout=5s period=30s #success=1 #failure=10
    Readiness:  http-get http://127.0.0.1:9879/healthz delay=0s timeout=5s period=30s #success=1 #failure=3
    Startup:    http-get http://127.0.0.1:9879/healthz delay=0s timeout=1s period=2s #success=1 #failure=105
    Environment:
      K8S_NODE_NAME:               (v1:spec.nodeName)
      CILIUM_K8S_NAMESPACE:       kube-system (v1:metadata.namespace)
      CILIUM_CLUSTERMESH_CONFIG:  /var/lib/cilium/clustermesh/
      KUBERNETES_SERVICE_HOST:    2e37bf88-3149-4fb7-8141-670f20da6700.k8s.ondigitalocean.com
      KUBERNETES_SERVICE_PORT:    443
[..]
  • la config-map utilisée par Cilium
kubectl get cm/cilium-config -o yaml -nkube-system
apiVersion: v1
data:
  agent-not-ready-taint-key: node.cilium.io/agent-not-ready
  annotate-k8s-node: "true"
  arping-refresh-period: 30s
  auto-direct-node-routes: "true"
  bpf-lb-external-clusterip: "false"
  bpf-lb-map-max: "65536"
  bpf-lb-proto-diff: "true"
  bpf-lb-sock: "false"
  bpf-map-dynamic-size-ratio: "0.0025"
  bpf-policy-map-max: "16384"
  bpf-root: /sys/fs/bpf
  cgroup-root: /run/cilium/cgroupv2
  cilium-endpoint-gc-interval: 5m0s
  cluster-id: "0"
  cluster-name: default
  cluster-pool-ipv4-mask-size: "25"
  cni-chaining-mode: portmap
  cni-exclusive: "false"
  cni-log-file: /var/run/cilium/cilium-cni.log
  custom-cni-conf: "false"
  debug: "false"
  debug-verbose: ""
  disable-cnp-status-updates: "true"
  egress-gateway-reconciliation-trigger-interval: 1s
  enable-auto-protect-node-port-range: "true"
  enable-bgp-control-plane: "false"
  enable-bpf-clock-probe: "false"
  enable-endpoint-health-checking: "true"
  enable-external-ips: "false"
  enable-health-check-nodeport: "false"
  enable-health-checking: "true"
  enable-host-legacy-routing: "true"
  enable-host-port: "false"
  enable-hubble: "true"
  enable-ipv4: "true"
  enable-ipv4-big-tcp: "false"
  enable-ipv4-masquerade: "true"
  enable-ipv6: "false"
  enable-ipv6-big-tcp: "false"
  enable-ipv6-masquerade: "true"
  enable-k8s-networkpolicy: "true"
  enable-k8s-terminating-endpoint: "true"
  enable-l2-neigh-discovery: "true"
  enable-l7-proxy: "true"
  enable-local-redirect-policy: "false"
  enable-node-port: "false"
  enable-policy: default
  enable-remote-node-identity: "true"
  enable-sctp: "false"
  enable-session-affinity: "true"
  enable-svc-source-range-check: "true"
  enable-vtep: "false"
  enable-well-known-identities: "true"
  enable-xt-socket-fallback: "true"
  external-envoy-proxy: "false"
  hubble-disable-tls: "false"
  hubble-listen-address: :4244
  hubble-socket-path: /var/run/cilium/hubble.sock
  hubble-tls-cert-file: /var/lib/cilium/tls/hubble/server.crt
  hubble-tls-client-ca-files: /var/lib/cilium/tls/hubble/client-ca.crt
  hubble-tls-key-file: /var/lib/cilium/tls/hubble/server.key
  identity-allocation-mode: crd
  identity-gc-interval: 5m
  identity-heartbeat-timeout: 15m
  install-no-conntrack-iptables-rules: "false"
  ipam: cluster-pool
  ipam-cilium-node-update-rate: 15s
  k8s-client-burst: "10"
  k8s-client-qps: "5"
  kube-proxy-replacement: partial
  kube-proxy-replacement-healthz-bind-address: ""
  mesh-auth-enabled: "true"
  mesh-auth-gc-interval: 5m0s
  mesh-auth-queue-size: "1024"
  mesh-auth-rotated-identities-queue-size: "1024"
  metrics: +cilium_bpf_map_pressure
  monitor-aggregation: medium
  monitor-aggregation-flags: all
  monitor-aggregation-interval: 5s
  node-port-bind-protection: "true"
  nodes-gc-interval: "0"
  operator-api-serve-addr: 127.0.0.1:9234
  preallocate-bpf-maps: "false"
  procfs: /host/proc
  prometheus-serve-addr: :9090
  proxy-connect-timeout: "2"
  proxy-max-connection-duration-seconds: "0"
  proxy-max-requests-per-connection: "0"
  remove-cilium-node-taints: "true"
  routing-mode: native
  set-cilium-is-up-condition: "true"
  set-cilium-node-taints: "true"
  sidecar-istio-proxy-image: cilium/istio_proxy
  skip-cnp-status-startup-clean: "false"
  synchronize-k8s-nodes: "true"
  tofqdns-dns-reject-response-code: refused
  tofqdns-enable-dns-compression: "true"
  tofqdns-endpoint-max-ip-per-hostname: "50"
  tofqdns-idle-connection-grace-period: 0s
  tofqdns-max-deferred-connection-deletes: "10000"
  tofqdns-proxy-response-max-delay: 100ms
  unmanaged-pod-watcher-interval: "15"
  vtep-cidr: ""
  vtep-endpoint: ""
  vtep-mac: ""
  vtep-mask: ""
  write-cni-conf-when-ready: /host/etc/cni/net.d/05-cilium.conflist
kind: ConfigMap
metadata:
  creationTimestamp: "2024-09-11T07:02:56Z"
  labels:
    c3.doks.digitalocean.com/component: cilium
    c3.doks.digitalocean.com/plane: data
    c3.doks.digitalocean.com/variant: legacy
    doks.digitalocean.com/managed: "true"
  name: cilium-config
  namespace: kube-system
  resourceVersion: "553"
  uid: efecf122-f5ac-4c0f-a66a-62abe89e9386

Revenir au sommaire | TP suivant