Introduction:
One of my NSX peers was recently working on an IP address overlap issue that helped lead to a better understanding of routing behaviour within an NSX environment.
The Scenario:
In this corner case scenario there is IP address overlap between these two subnets:
- The NSX environment, Web-Segment, 172.16.10.0/24
- The External environment, Legacy-Segment-2, 172.16.10.100/32
Connectivity State:
The Guest VM, app, has partial IP connectivity:
- app can reach web
- app can reach Legacy-Segment-1
- app can not reach Legacy-Segment-2
Verify Routing at the NSX Edge Tier-0 Service Router:
Edge-01> get logical-routers
Edge-01> vrf 1
Edge-01(tier0_sr[1])> get route
… routing table shortened for brevity
t0c> * 172.16.10.0/24 is directly connected, downlink-275, 3d10h21m <– Web-Segment route
b > * 172.16.10.100/32 [20/66] via 192.168.100.1, uplink-271, 2d04h08m <– Legacy-Segment-2 route
b > * 172.18.10.100/32 [20/66] via 192.168.100.1, uplink-271, 2d03h47m <– Legacy-Segment-1 route
…
From this we can see that the three target networks are in the Tier-0 gateway routing table.
Verify Forwarding at the NSX Edge Tier-0 Distributed Router:
Edge-01(tier0_sr[1])> vrf 3
Edge-01(vrf[3])> get forwarding
… forwarding table shortened for brevity
172.16.10.0/24 route 6949af09-65f8-4d7e-a48e-96689224e476 <– Web-Segment route in forwarding table
172.16.10.100/32 192.168.100.1 route bb21e6a4-21c1-419a-ae57-25659decd4a2 00:50:56:02:87:99 <– Legacy-Segment-2 in forwarding table
172.18.10.100/32 192.168.100.1 route bb21e6a4-21c1-419a-ae57-25659decd4a2 00:50:56:02:87:99 <– Legacy-Segment-1 in forwarding table
From this we can see that the three target networks are in the Tier-0 gateway forwarding table.
Verify routing at the ESXi Host Tier-0 Distributed Router:
[root@esxi-02:~] net-vdr -l –route b188adce-aaf6-4203-84bb-5e9027b72ab0
DR b188adce-aaf6-4203-84bb-5e9027b72ab0 Route Table
Legend: [U: Up], [G: Gateway], [C: Connected], [I: Interface]
Legend: [H: Host], [B: Blackhole], [F: Soft Flush] [!: Reject] [E: ECMP]
Destination GenMask Gateway Flags Ref UpTime HitCount Lif UUID
———– ——- ——- —– — —— ——– ———
0.0.0.0 0.0.0.0 169.254.0.2 UGE 1 109675 452 db804f8e-7c43-46b6-a479-8f826aa4d01e <– default route to NSX Edge Edge-01
0.0.0.0 0.0.0.0 169.254.0.3 UGE 1 109675 452 db804f8e-7c43-46b6-a479-8f826aa4d01e. <– default route to NSX Edge Edge-01
169.254.0.0 255.255.255.128 0.0.0.0 UCI 1 109675 3 db804f8e-7c43-46b6-a479-8f826aa4d01e
172.16.10.0 255.255.255.0 0.0.0.0 UCI 1 109675 639 6949af09-65f8-4d7e-a48e-96689224e476 <– Web-Segment route
172.16.20.0 255.255.255.0 0.0.0.0 UCI 1 109675 3 4412afc4-b2e0-4ea8-aaca-d5cd9d4bbf41
172.16.30.0 255.255.255.0 0.0.0.0 UCI 1 109675 1 efc66c34-8418-431d-9205-225ab585b257
172.16.40.0 255.255.255.0 0.0.0.0 UCI 1 109675 1 273eaefa-9375-4bd2-bfc2-34205f3b71f8
192.168.100.0 255.255.255.0 169.254.0.2 UG 1 109675 1 db804f8e-7c43-46b6-a479-8f826aa4d01e <– NSX Edge uplink route
192.168.100.2 255.255.255.255 169.254.0.2 UGH 1 109675 1 db804f8e-7c43-46b6-a479-8f826aa4d01e
192.168.110.0 255.255.255.0 169.254.0.3 UG 1 109675 1 db804f8e-7c43-46b6-a479-8f826aa4d01e
192.168.110.2 255.255.255.255 169.254.0.3 UGH 1 109675 1 db804f8e-7c43-46b6-a479-8f826aa4d01e
From this we can see that the legacy networks are not in ESXi Host Tier-0 Distributed Router forwarding table.
Routing and Forwarding Table Comparison:
Here is a logical view of the routing and forwarding tables in play:
Notice that:
- The NSX Edge Tier-0 DR forwarding table lines up with the NSX Edge Tier-0 SR routing table.
- The Edge forwarding tables have the same forwarding entries
- The host Tier-0 DR forwarding table does not match the NSX Edge Tier-0 DR forwarding table
- The host Tier-0 DR forwarding table only includes the entries it needs for directly connected networks, gateway uplinks, and the default routes
- Not all DR forwarding tables are equal
ESXi Host Troubleshooting:
VM app01 resides on ESXi host esxi-01.
[root@esxi-01:~] net-stats -l | grep app.eth0
67108906 5 9 DvsPortset-0 00:50:56:9d:f4:e8 app.eth0 <— app is connected to port 67108906.
[root@esxi-01 pktcap-uw –switchport 67108906 –dir 2 -o – | tcpdump-uw -r – -nn
16:00:05.669062 IP 172.16.20.11 > 172.16.10.100: ICMP echo request, id 11, seq 2, length 64. <– app-01 arp request for 172.16.10.100
16:00:06.693111 IP 172.16.20.11 > 172.16.10.100: ICMP echo request, id 11, seq 3, length 64
16:00:09.082008 IP 172.16.20.1 > 172.16.20.11: ICMP host 172.16.10.100 unreachable, length 92 <– Tier-0 DR on ESXi replies it is unreachable
16:00:09.082008 IP 172.16.20.1 > 172.16.20.11: ICMP host 172.16.10.100 unreachable, length 92
16:00:09.082009 IP 172.16.20.1 > 172.16.20.11: ICMP host 172.16.10.100 unreachable, length 92
Since the ESXi Tier-0 DR doesn’t have the 172.16.10.100/32 route, it thinks 172.16.10.100 is local to 172.16.10.0/24. Since there is no ARP entry for 172.16.10.100, the ESXi Tier-0 DR replies that 172.16.10.100 is unreachable.
Observations:
- The NSX Edge Tier-0 DR forwarding table lines up with the NSX Edge Tier-0 SR routing table.
- The host Tier-0 DR forwarding table includes the entries it needs for directly connected networks, and for default route(s) to the NSX Edge(s).
- The ESXi host is like a stub network device, which is sufficient to cover most use cases. If a network is not local, it must be accessible through an NSX Edge.
- The host Tier-0 DR routing table does not necessarily match the NSX Edge Tier-0 DR forwarding table.
Conclusion:
The simplified NSX Host DR routing has the performance benefit of scaling well, but has the draw back of not being able to support this ip address overlap corner case.