Virtual Southwest
  • Blog
  • About
  • Presentations

Dropped Packets, Out of Buffers, RX buffers full..      Moving packets in software, Part 2

8/6/2021

1 Comment

 
Picture
The above diagram was taken from a Cisco Live presentation I attended on cloud networking.  
The challenges of  converting a NF (network function) into a VNF (or, Virtualized Network Function), running on virtualized software and hardware, or even in a cloud, are the same for NSX and the growing list of products.
Some of the ever growing list  from Cisco can be found here.
Still borrowing from the presentation, take a look at the path a network packet (ok or frame) generally follows from the switch to a virtual environment.
Picture
The network data come into the Nic, gets assigned to memory..
Picture
Ok then an interrupt request and process to the hypervisor, I will say...
Picture
Then DMA onto the kernel packet buffer...
Picture
Ah then finally it get to the user space and the, let's say virtual machine..
Picture
Ah but then if the RX buffer space is used up we hit the dropped packets and out of buffers.
It's amazing to me that with all these steps that a virtual network appliance or NFV of some flavor, is able to process data with the speed and low latency as they do, ala NSX-T today!

The proposed solution, at the time of the presentation was to use the Vector Packet Processor (VPP) modules developed by fd.io
If you're looing to tune a NSX-T implementation for higher performance, check out he Mellanox adapter info here
For the Cisco Live presentation I have borrowed for this, you can find it at:
Cloud Networking BRKCLD2013
​


1 Comment

Dropped Packets, Out of Buffers, RX buffers full..      Moving packets in software, Part 1

8/2/2021

1 Comment

 
Picture


Just like a physical network switch or router, a virtual machines vNIC must have buffers to temporarily store incoming network frames for processing. During periods of very heavy load, the guest may not have the cycles to handle all the incoming frames and the buffer is used to temporarily queue up these frames. If the buffer fills more quickly than it is emptied, or processed further, the vNIC driver has no choice but to drop additional incoming frames. This is referred to as a full buffer or ring exhaustion.
Ok I said moving packets, but technically I guess it's moving network frames right? Or should we go with bits?
Anyway, there are a number of VMware KB articles regarding issues with dropped packets and performance issues in any virtual guest OS. ​KB2039495 and KB50121760 are just a few.  These dropped packets/out of buffers issues were also prevalent on NSX-V and NSX-T edges, since their vNics need to process the frames(?) in the same manner.
I have troubleshot many issues on NSX-V edges that had high packet loss and out of buffers. One problem was that prior to NSX-V version 6.4.6, the vNic rx buffer size was set at 512, even if you increased the size of the edge!
And I have also seen this behavior on NSX-T edges, especially a T-0 edge, with version below 3.1.

Since the KB articles, and several other posts go into the checking and modifying the RX buffer size on the NSX edge and a guest vm, I won't go into all the details, but here are the general steps and commands that I have used to check for any RX buffer issues on an NSX edge or vm:
​
First, ssh to the ESXi host that the edge or vm is running on:
[root@esx4:~] net-stats -l
Sample output:
PortNum          Type SubType SwitchName       MACAddress         ClientName
33554539            5       9 DvsPortset-0     00:50:56:b5:60:9e  Edge01-1.eth0

Next retrieve the switch statistics for a specific port,  enter- esxcli network port stats get -p 33554539
[root@esx4:~] esxcli network port stats get -p 33554539
Sample output:
Packet statistics for port 33554539
   Packets received: 100120460052
   Packets sent: 48907954505
   Bytes received: 64575706925507
   Bytes sent: 9407670139350
   Broadcast packets received: 789
   Broadcast packets sent: 50
   Multicast packets received: 0
   Multicast packets sent: 4
   Unicast packets received: 100120459263
   Unicast packets sent: 48907954451
   Receive packets dropped: 2326648
   Transmit packets dropped: 0
Ouch!  So yes there are Receive packets dropped there. I then run vsish command for a specified port number to list the rx statistics against that port-
[root@esx4:~] vsish -e get /net/portsets/DvsPortset-0/ports/33554539/vmxnet3/rxSummary
Sample output abbreviated:
stats of a vmxnet3 vNIC rx queue {
 LRO pkts rx ok:0
   LRO bytes rx ok:0
   pkts rx ok:100341287995
   bytes rx ok:98187846706473
   unicast pkts rx ok:100341286995
   unicast bytes rx ok:98187846646473
   multicast pkts rx ok:0
   multicast bytes rx ok:0
   broadcast pkts rx ok:1000
   broadcast bytes rx ok:60000
   running out of buffers:2368132
   pkts receive error:0
   1st ring size:512
   2nd ring size:128
   # of times the 1st ring is full:2326634
   # of times the 2nd ring is full:0
   fail to map a rx buffer:47
   request to page in a buffer:47

From the above, we see that the running out of buffers is high, and this also shows the setting 1st ring size:512 
which is the limiting issue, so this can be increased to 4096 on the NSX edges, and most guest OS vm's.
These VMware KB2039495 and KB1010071 lists the steps to modify the RX buffer values on a Windows and Linux vm's.
The latest NSX versions have addressed the RX buffer issues with the edge sizing,  so this issue should be reduced, however this can be an issue on any virtual OS or appliance handling high network traffic, as I will try to show on the next post...​
1 Comment

So, what happened to the 2020 posts??

8/2/2021

1 Comment

 
Picture
Yes, 2020 did pose some challenges for all of us.  I had to take a few months off to deal with personal issues.
And on top of that, I realized I had exceeded the storage limit on my site, and several recent posts were removed...

​2020 started off well, I worked at the VMware campus in Palo Alto California, and came back at the end of February, just as Covid-19 was exploding!
My group has been working remotely since March 2020, and it looks like things on the pandemic front are improving.

Anyway, I am now in the process of updating my web site storage and will re-add some of the missing posts.
A few posts will include-
  • Finding performance issues using Wavefront by VMware, and editing graphs.
  • VMware vExpert news
  • VMware vSan- cleaning up white space
Bear with me during this, and I hope everyone stays safe and healthy!!
1 Comment
    View my profile on LinkedIn
    Follow @virtsouthwest

    RSS Feed

    Archives

    December 2024
    October 2024
    August 2024
    September 2023
    September 2022
    June 2022
    August 2021
    December 2019
    September 2019
    January 2019
    August 2018
    June 2018
    October 2017
    September 2017
    March 2017
    September 2016
    February 2016
    November 2015
    March 2015
    May 2014
    January 2014
    July 2013
    April 2013
    December 2012
    September 2012
    August 2012
    July 2012
    June 2012

[email protected]