Edge Maintenance Mode Overview
The NSX-T Edge cluster is a logical grouping of NSX-T Edge virtual machines that provide North-South routing for the workloads in the compute clusters. NSX-T Edges can be taken out of production by being placed in maintenance mode, if for example, the Edge has become inoperable.
If high availability is enabled on the associated logical routers, entering maintenance mode will cause the logical routers to use a different NSX Edge cluster member, helping to minimize downtime.
In NSX for vSphere, Edges could operate as a High Availability pair, with one Active and one Standby. In NSX-T it is the Edge services such as routing, bridging, and load balancing that individually operate as Active/Active or Active/Standby.
In this post, we’ll take a look at NSX-T Edge Maintenance Mode, in a lab setup with high availability gateways running on an Edge Cluster.
Lab Setup
We will re-use the NSX-T lab used for Part 2 of NSX-T East-West Traffic Flow since there were two scenarios where traffic flow was over an NSX-T Edge Cluster. Let’s use Scenario 6.
Notice in this scenario that traffic traverses an Edge Node, in this case, nsxtedge01, since Tier-1 gateways are instantiated on the Edge Cluster.
High Availability Setup
Keep in mind that for NSX-T Edges, there is no concept of Active and Standby Edges in an Edge Cluster. It is the services provided by the Edge Cluster, such as Tier-0, Tier-1, Load Balancing, and Bridging that operate in Active and Standby modes.
In this lab, Tier-0 is set to Active-Active, and the Tier-1s are set to Active-Standby:
Also, note that lab-tier-1-tenant-1 and lab-tier-1-tenant-2 Tier-1 Gateways are instantiated on the Edge Cluster and that Failover is set to Non-Preemptive.
Since nsxtedge01 is currently on the traffic path, lets put this Edge in maintenance mode. This operation can be performed via CLI or API.
Note on the deprecated fabric node API
The original version of this article referenced the fabric node APIs to put an edge into maintenance mode. The fabric nodes APIs were deprecated in NSX-T 2.5, and will be removed in a future release. These have been replaced by transport node APIs, which are used in this revised article.
Depricated fabric nodes API, uses the following syntax: /api/v1/fabric/nodes/’
Replaced by transport node API, uses the following syntax: /api/v1/transport-nodes/’
Placing an NSX-T Edge in Maintenance Mode using the REST API
The REST API method of placing an Edge in Maintenance Mode can be found in the NSX-T Admin Guide. It is shown here using Postman for Chrome. Start by getting the fabric node ID of nsxtedge01:
With an nsxtedge01 node ID of 7af45036-6d42-11ea-a22d-00505696b642, run the GET again for just nsxtedge01 as a sanity check. Notice that maintenace_mode is currently DISABLED:
Use a POST to place nsxtedge01 in maintenance mode, using “?action=enter_maintenance_mode”. If successful the response will be 200 OK:
Repeating Traceflow, traffic is now over nsxtedge02 as expected:
Removing an NSX-T Edge from Maintenance Mode using the CLI
nsxtedge01> get maintenance-mode Maintenance Mode: enabled <- from the CLI we can see that Maintenance Mode is enabled nsxtedge01> set maintenance-mode ? enabled Specify if a feature should be enabled or disabled disabled Specify if a feature should be enabled or disabled nsxtedge01> set maintenance-mode disabled <- take nsxtedge01 back out of Maintenance Mode Maintenance Mode: disabled nsxtedge01> get maintenance-mode Maintenance Mode: disabled <- nsxtedge01 is now back in production
Since Tier-1 Gateway Failover is set to Non-Preemptive, traffic remains over nsxtedge02:
How disruptive are these Maintenance Mode operations?
Let’s save this interesting question for part 2 in this series.