One of the best diagnosis for NSX-T Load Balancer Troubleshooting is through careful monitoring and analysis of Edge error log activity when the Load Balancer error log level is set to debug. Along with Edge CLI command output, this deep-dive level of analysis can be very insightful. In this post, we’ll run through a quick list of steps required for this type of troubleshooting.
This article is not a how-to guide on NSX-T Load Balancer setup. Ronald de Jong does a nice job of that on a two-part article, that starts here.
1. SSH into Edge using root account
This can be achieved directly via the root account or through the admin account and then entering engineering mode.
Option 1: Accessing the Edge via the root account: login as: root root@192.168.110.66's password: TIPS: To reconfig management interface, please refer to these CLIs 1) stop service dataplane 2) set interface interface-name vlan vlan-id plane mgmt (for creating vlan sub-interface) 3) set interface interface-name ip x.x.x.x/24 gateway x.x.x.x plane mgmt (for static ip) set interface interface-name dhcp plane mgmt (for dhcp) 4) start service dataplane To config in-band management interface, please refer to these CLIs 1) set interface mac mac-addr vlan vlan-id in-band plane mgmt 2) set interface eth0.vlan ip x.x.x.x/24 gateway x.x.x.x plane mgmt (for static ip) set interface eth0.vlan dhcp plane mgmt (for dhcp) Last login: Sun Oct 6 11:27:06 2019 from 192.168.110.10 NOTICE TO USERS WARNING! Changes made to NSX Data Center while logged in as the root user can cause system failure and potentially impact your network. Please be advised that changes made to the system as the root user must only be made under the guidance of VMware. root@nsxtedge02:~# Option 2: Through the admin account and then entering engineering mode: login as: admin admin@192.168.110.66's password: TIPS: To reconfig management interface, please refer to these CLIs 1) stop service dataplane 2) set interface interface-name vlan vlan-id plane mgmt (for creating vlan sub-interface) 3) set interface interface-name ip x.x.x.x/24 gateway x.x.x.x plane mgmt (for static ip) set interface interface-name dhcp plane mgmt (for dhcp) 4) start service dataplane To config in-band management interface, please refer to these CLIs 1) set interface mac mac-addr vlan vlan-id in-band plane mgmt 2) set interface eth0.vlan ip x.x.x.x/24 gateway x.x.x.x plane mgmt (for static ip) set interface eth0.vlan dhcp plane mgmt (for dhcp) Last login: Fri Oct 4 12:53:02 2019 from 192.168.110.10 NSX CLI (Edge 2.4.1.0.0.13716583). Press ? for command list or enter: help nsxtedge02> st en <--- entering engineering mode on the edge Password: <--- enter the root account password NOTICE TO USERS WARNING! Changes made to NSX Data Center while logged in as the root user can cause system failure and potentially impact your network. Please be advised that changes made to the system as the root user must only be made under the guidance of VMware. root@nsxtedge02:~#
2. Determine the Load Balancer UUID from the NSX CLI
root@nsxtedge02:~# nsxcli NSX CLI (Edge 2.4.1.0.0.13716583). Press ? for command list or enter: help nsxtedge02> nsxtedge02> nsxtedge02> get load-balancer Load Balancer Access Log Enabled : False Display Name : web-farm Enabled : True UUID : ff550f3a-653a-4243-9652-4c3dabfcbfe7 <--- load balancer UUID Log Level : LB_LOG_LEVEL_INFO Size : SMALL Virtual Server Id : b71ca7e4-f11a-44c5-b45a-656ac05f91b6
3. Locate Load Balancer Logs
nsxtedge02> exit <— exit the NSX CLI, back to root access
root@nsxtedge02:~#
root@nsxtedge02:~# cd /var/log/lb/ <— this is the root folder for the NSX-T Load Balancer
root@nsxtedge02:/var/log/lb# ls
access.log error.log ff550f3a-653a-4243-9652-4c3dabfcbfe7 <— notice that there is a folder with the Load Balancer UUID
root@nsxtedge02:/var/log/lb# cd ff550f3a-653a-4243-9652-4c3dabfcbfe7/logs <— note that there is a logs subfolder
root@nsxtedge02:/var/log/lb/ff550f3a-653a-4243-9652-4c3dabfcbfe7/logs# ls -larth
total 8.0K
drwxr-x— 3 lb lb 4.0K Oct 1 19:34 ..
-rw-r—– 1 lb lb 0 Oct 1 19:34 access-b71ca7e4-f11a-44c5-b45a-656ac05f91b6.log . <— without debug level logging, there is only an access log
4. Set Error Log Level to Debug
Set the Load Balancer Error Log Level to Debug via the NSX-T MAnager WebUI:
5. Tail error.log
With the log level set to debug, we now have an error.log created in the logs folder, with all the details required for a deep-dive level of analysis. In my lab, I’m set up for a simple HTTP L4 load balancer. With debug logging being so chatty a grep style filter is often required.
root@nsxtedge02:/var/log/lb/ff550f3a-653a-4243-9652-4c3dabfcbfe7/logs# ls -larth total 24K drwxr-x--- 3 lb lb 4.0K Oct 1 19:34 .. -rw-r----- 1 lb lb 0 Oct 1 19:34 access-b71ca7e4-f11a-44c5-b45a-656ac05f91b6.log drwxr-x--- 2 lb lb 4.0K Oct 6 11:48 . -rw-r----- 1 lb lb 14K Oct 6 11:48 error.log <--- with the debug log level, we now have error.log Since this is a very chatty file filter output using grep: root@nsxtedge02:/var/log/lb/ff550f3a-653a-4243-9652-4c3dabfcbfe7/logs# tail -f error.log | grep l4lb
6. Generate some Application Traffic
In my lab, I have nginx web services running on a pool of two Linux web servers. Let’s generate some L4 Load Balancer traffic:
7. Tail error.log and filter with grep
You can get creative with the grep filter, depending on what traffic type you’re looking to analyze.
root@nsxtedge02:/var/log/lb/ff550f3a-653a-4243-9652-4c3dabfcbfe7/logs# tail -f error.log | grep l4lb 2019/10/06 12:02:50 [debug] 6773#0: l4lb l4cp_id=0 acquired lock 2019/10/06 12:02:50 [debug] 6773#0: l4lb current rcvd_ts[dp=0]=171580, new ts rcvd 171798 2019/10/06 12:02:50 [debug] 6773#0: l4lb purged 0 sessions, setting rcvd_ts[dp=0]=171798, cp_ts[dp=0]=171799 pending=0 2019/10/06 12:02:50 [debug] 6773#0: l4lb find vs - (6, 192.169.70.100/80) 2019/10/06 12:02:50 [debug] 6773#0: l4lb got snat resource for session 9 2019/10/06 12:02:50 [debug] 6773#0: l4lb create session - (6) 192.168.110.10/54842 -> 192.169.70.100/80 => 100.64.0.1/4099 -> 192.168.70.101/80, state valid, cp_sid 0x9, dp_ts 171799 2019/10/06 12:02:50 [debug] 6773#0: l4lb setting oldest_sess_ts to 171799 2019/10/06 12:02:50 [debug] 6773#0: l4lb current rcvd_ts[dp=1]=171651, new ts rcvd 171798 2019/10/06 12:02:50 [debug] 6773#0: l4lb purged 0 sessions, setting rcvd_ts[dp=1]=171798, cp_ts[dp=1]=171799 pending=0 2019/10/06 12:02:50 [debug] 6773#0: l4lb find vs - (6, 192.169.70.100/80) 2019/10/06 12:02:50 [debug] 6773#0: l4lb got snat resource for session A 2019/10/06 12:02:50 [debug] 6773#0: l4lb create session - (6) 192.168.110.10/54841 -> 192.169.70.100/80 => 100.64.0.1/4100 -> 192.168.70.101/80, state valid, cp_sid 0xA, dp_ts 171799 2019/10/06 12:02:50 [debug] 6773#0: l4lb setting oldest_sess_ts to 171799 2019/10/06 12:02:50 [debug] 6773#0: l4lb l4cp_id=0 released lock root@nsxtedge02:/var/log/lb/ff550f3a-653a-4243-9652-4c3dabfcbfe7/logs# tail -f error.log | grep '192\.' 2019/10/06 14:26:17 [debug] 6773#0: l4lb find vs - (6, 192.169.70.100/80) 2019/10/06 14:26:17 [debug] 6773#0: l4lb create session - (6) 192.168.110.10/51993 -> 192.169.70.100/80 => 100.64.0.1/4109 -> 192.168.70.101/80, state valid, cp_sid 0x17, dp_ts 180406 2019/10/06 14:26:17 [debug] 6773#0: l4lb find vs - (6, 192.169.70.100/80) 2019/10/06 14:26:17 [debug] 6773#0: l4lb create session - (6) 192.168.110.10/51994 -> 192.169.70.100/80 => 100.64.0.1/4110 -> 192.168.70.101/80, state valid, cp_sid 0x18, dp_ts 180406 2019/10/06 14:26:18 [debug] 6772#0: connect to 192.168.70.101:80, fd:14 #6901 2019/10/06 14:26:18 [debug] 6772#0: http check recv size: 237, peer: 192.168.70.101:80 2019/10/06 14:26:18 [debug] 6772#0: http check recv size: -2, peer: 192.168.70.101:80 (11: Resource temporarily unavailable) 2019/10/06 14:26:18 [debug] 6772#0: hc http parse: rcvd response status 200 from server 192.168.70.101:80(pool LBb49c0d78-6969-4eec-94df-550afc2db827), expected http status code: 2xx - 1 0, 3xx - 0 0, 4xx - 0 0, 5xx - 0 0
This other article covers some traffic capture types that are often helpful in NSX-T Load Balancer Troubleshooting.