NSX

NSX-T Load Balancer Troubleshooting

One of the best diagnosis for NSX-T Load Balancer Troubleshooting is through careful monitoring and analysis of Edge error log activity when the Load Balancer error log level is set to debug. Along with Edge CLI command output, this deep-dive level of analysis can be very insightful. In this post, we’ll run through a quick list of steps required for this type of troubleshooting.

This article is not a how-to guide on NSX-T Load Balancer setup. Ronald de Jong does a nice job of that on a two-part article, that starts here.

1. SSH into Edge using root account

This can be achieved directly via the root account or through the admin account and then entering engineering mode.

Option 1: Accessing the Edge via the root account:

login as: root
 root@192.168.110.66's password:
 TIPS:  To reconfig management interface, please refer to these CLIs
  1) stop service dataplane
  2) set interface interface-name vlan vlan-id plane mgmt (for creating vlan sub-interface)
  3) set interface interface-name ip x.x.x.x/24 gateway x.x.x.x plane mgmt (for static ip)
     set interface interface-name dhcp plane mgmt (for dhcp)
  4) start service dataplane
 To config in-band management interface, please refer to these CLIs
  1) set interface mac mac-addr vlan vlan-id in-band plane mgmt
  2) set interface eth0.vlan ip x.x.x.x/24 gateway x.x.x.x plane mgmt (for static ip)
     set interface eth0.vlan dhcp plane mgmt (for dhcp)
 Last login: Sun Oct  6 11:27:06 2019 from 192.168.110.10

 NOTICE TO USERS
 WARNING! Changes made to NSX Data Center while logged in as the root user
 can cause system failure and potentially impact your network. Please be
 advised that changes made to the system as the root user must only be made
 under the guidance of VMware.

 root@nsxtedge02:~#


Option 2: Through the admin account and then entering engineering mode:

login as: admin
 admin@192.168.110.66's password:
 TIPS:  To reconfig management interface, please refer to these CLIs
  1) stop service dataplane
  2) set interface interface-name vlan vlan-id plane mgmt (for creating vlan sub-interface)
  3) set interface interface-name ip x.x.x.x/24 gateway x.x.x.x plane mgmt (for static ip)
     set interface interface-name dhcp plane mgmt (for dhcp)
  4) start service dataplane
 To config in-band management interface, please refer to these CLIs
  1) set interface mac mac-addr vlan vlan-id in-band plane mgmt
  2) set interface eth0.vlan ip x.x.x.x/24 gateway x.x.x.x plane mgmt (for static ip)
     set interface eth0.vlan dhcp plane mgmt (for dhcp)
 Last login: Fri Oct  4 12:53:02 2019 from 192.168.110.10
 NSX CLI (Edge 2.4.1.0.0.13716583). Press ? for command list or enter: help


 nsxtedge02> st en         <--- entering engineering mode on the edge
 Password:                 <--- enter the root account password

 NOTICE TO USERS
 WARNING! Changes made to NSX Data Center while logged in as the root user
 can cause system failure and potentially impact your network. Please be
 advised that changes made to the system as the root user must only be made
 under the guidance of VMware.

 root@nsxtedge02:~#

2. Determine the Load Balancer UUID from the NSX CLI

root@nsxtedge02:~# nsxcli
 NSX CLI (Edge 2.4.1.0.0.13716583). Press ? for command list or enter: help
 nsxtedge02>
 nsxtedge02>
 nsxtedge02> get load-balancer
 Load Balancer
 Access Log Enabled                 : False
 Display Name                       : web-farm
 Enabled                            : True
 UUID                               : ff550f3a-653a-4243-9652-4c3dabfcbfe7       <--- load balancer UUID
 Log Level                          : LB_LOG_LEVEL_INFO
 Size                               : SMALL
 Virtual Server Id                  : b71ca7e4-f11a-44c5-b45a-656ac05f91b6

3. Locate Load Balancer Logs

nsxtedge02> exit <— exit the NSX CLI, back to root access
root@nsxtedge02:~#
root@nsxtedge02:~# cd /var/log/lb/ <— this is the root folder for the NSX-T Load Balancer
root@nsxtedge02:/var/log/lb# ls
access.log error.log ff550f3a-653a-4243-9652-4c3dabfcbfe7 <— notice that there is a folder with the Load Balancer UUID

root@nsxtedge02:/var/log/lb# cd ff550f3a-653a-4243-9652-4c3dabfcbfe7/logs <— note that there is a logs subfolder

root@nsxtedge02:/var/log/lb/ff550f3a-653a-4243-9652-4c3dabfcbfe7/logs# ls -larth
total 8.0K
drwxr-x— 3 lb lb 4.0K Oct 1 19:34 ..
-rw-r—– 1 lb lb 0 Oct 1 19:34 access-b71ca7e4-f11a-44c5-b45a-656ac05f91b6.log . <— without debug level logging, there is only an access log

4. Set Error Log Level to Debug

Set the Load Balancer Error Log Level to Debug via the NSX-T MAnager WebUI:

NSX-T Load Balancer Troubleshooting

5. Tail error.log

With the log level set to debug, we now have an error.log created in the logs folder, with all the details required for a deep-dive level of analysis. In my lab, I’m set up for a simple HTTP L4 load balancer. With debug logging being so chatty a grep style filter is often required.

root@nsxtedge02:/var/log/lb/ff550f3a-653a-4243-9652-4c3dabfcbfe7/logs# ls -larth
 total 24K
 drwxr-x--- 3 lb lb 4.0K Oct  1 19:34 ..
 -rw-r----- 1 lb lb    0 Oct  1 19:34 access-b71ca7e4-f11a-44c5-b45a-656ac05f91b6.log
 drwxr-x--- 2 lb lb 4.0K Oct  6 11:48 .
 -rw-r----- 1 lb lb  14K Oct  6 11:48 error.log                        <--- with the debug log level, we now have error.log

Since this is a very chatty file filter output using grep:

root@nsxtedge02:/var/log/lb/ff550f3a-653a-4243-9652-4c3dabfcbfe7/logs#  tail -f error.log | grep l4lb

6. Generate some Application Traffic

In my lab, I have nginx web services running on a pool of two Linux web servers. Let’s generate some L4 Load Balancer traffic:

NSX-T Load Balancer Troubleshooting

7. Tail error.log and filter with grep

You can get creative with the grep filter, depending on what traffic type you’re looking to analyze.

root@nsxtedge02:/var/log/lb/ff550f3a-653a-4243-9652-4c3dabfcbfe7/logs#  tail -f error.log | grep l4lb
 2019/10/06 12:02:50 [debug] 6773#0: l4lb l4cp_id=0 acquired lock
 2019/10/06 12:02:50 [debug] 6773#0: l4lb current rcvd_ts[dp=0]=171580, new ts rcvd 171798
 2019/10/06 12:02:50 [debug] 6773#0: l4lb purged 0 sessions, setting rcvd_ts[dp=0]=171798, cp_ts[dp=0]=171799 pending=0
 2019/10/06 12:02:50 [debug] 6773#0: l4lb  find vs - (6, 192.169.70.100/80)
 2019/10/06 12:02:50 [debug] 6773#0: l4lb got snat resource for session 9
 2019/10/06 12:02:50 [debug] 6773#0: l4lb create session - (6) 192.168.110.10/54842 -> 192.169.70.100/80 => 100.64.0.1/4099 -> 192.168.70.101/80, state valid, cp_sid 0x9, dp_ts 171799
 2019/10/06 12:02:50 [debug] 6773#0: l4lb setting oldest_sess_ts to 171799
 2019/10/06 12:02:50 [debug] 6773#0: l4lb current rcvd_ts[dp=1]=171651, new ts rcvd 171798
 2019/10/06 12:02:50 [debug] 6773#0: l4lb purged 0 sessions, setting rcvd_ts[dp=1]=171798, cp_ts[dp=1]=171799 pending=0
 2019/10/06 12:02:50 [debug] 6773#0: l4lb  find vs - (6, 192.169.70.100/80)
 2019/10/06 12:02:50 [debug] 6773#0: l4lb got snat resource for session A
 2019/10/06 12:02:50 [debug] 6773#0: l4lb create session - (6) 192.168.110.10/54841 -> 192.169.70.100/80 => 100.64.0.1/4100 -> 192.168.70.101/80, state valid, cp_sid 0xA, dp_ts 171799
 2019/10/06 12:02:50 [debug] 6773#0: l4lb setting oldest_sess_ts to 171799
 2019/10/06 12:02:50 [debug] 6773#0: l4lb l4cp_id=0 released lock

root@nsxtedge02:/var/log/lb/ff550f3a-653a-4243-9652-4c3dabfcbfe7/logs# tail -f error.log | grep '192\.'
2019/10/06 14:26:17 [debug] 6773#0: l4lb  find vs - (6, 192.169.70.100/80)
 2019/10/06 14:26:17 [debug] 6773#0: l4lb create session - (6) 192.168.110.10/51993 -> 192.169.70.100/80 => 100.64.0.1/4109 -> 192.168.70.101/80, state valid, cp_sid 0x17, dp_ts 180406
 2019/10/06 14:26:17 [debug] 6773#0: l4lb  find vs - (6, 192.169.70.100/80)
 2019/10/06 14:26:17 [debug] 6773#0: l4lb create session - (6) 192.168.110.10/51994 -> 192.169.70.100/80 => 100.64.0.1/4110 -> 192.168.70.101/80, state valid, cp_sid 0x18, dp_ts 180406
 2019/10/06 14:26:18 [debug] 6772#0: connect to 192.168.70.101:80, fd:14 #6901
 2019/10/06 14:26:18 [debug] 6772#0: http check recv size: 237, peer: 192.168.70.101:80
 2019/10/06 14:26:18 [debug] 6772#0: http check recv size: -2, peer: 192.168.70.101:80  (11: Resource temporarily unavailable)
 2019/10/06 14:26:18 [debug] 6772#0: hc http parse: rcvd response status 200 from server 192.168.70.101:80(pool LBb49c0d78-6969-4eec-94df-550afc2db827), expected http status code: 2xx - 1 0, 3xx - 0 0, 4xx - 0 0, 5xx - 0 0

This other article covers some traffic capture types that are often helpful in NSX-T Load Balancer Troubleshooting.

Begin typing your search term above and press enter to search. Press ESC to cancel.