Very High snmpd CPU Usage and 1000s of IP Addresses

Clipper_fistFor a while now we have been experiencing SNMP timeouts from our central monitoring server to a set of new and recently deployed servers. We at first attributed the issue to the network driver being a bit different than what we have used in the past (latest greatest hardware) and then to the fact that we are completely gutting our production network in situ. Normalization of the driver across this application farm made no difference and the issue began to get worse, spreading to just about all servers in this one specific service tier. As it was not impacting any other servers it wasn’t surprising that we could not find an issue at the network layer so we had to keep digging.

In our environment this tier is unique for two reasons:

  1. Very high IO per server. On the order of 10,000 write ops to disk per second at peak load.
  2. Thousands of private IPs per system (added as ip rules – not virtual addresses on interface devices).

The first issue was in fact an issue, just not for SNMP. The snmpd process was getting lost in the noise generated by the high system load the disk IO was causing (lots or concurrent processes in this app). The team quickly addressed this issue and scaled the IO subsystem substantially to meet the growing workload. After this change the system load dropped from over 1000 to under 10 (1 minute average) during peak utilization on these servers. Unfortunately snmpd was still timing out. However, now with the noise of all those blocked processes out of the way, running a top showed the snmpd process stuck at 100 percent cpu usage for very long period of times. In fact, periods of less than 100 percent cpu were noted for their rarity.

That led us to difference number two. Recently we had been normalizing our system configurations which resulted in the whole tier of afflicted servers having even more IPs assigned. Ok great the team thought – we know where that lives in net-snmp – the IP-MIB. Our assumption was that net-snmp should only refresh this MIB based on client requests and for some reason our monitoring solution was requesting all the IPs from the IP-MIB portion. That would certainly explain the timeouts we thought. Step one was try the latest net-snmp just in case it was bug on our old default 5.4 install. No change. So the team’s next step to trouble shooting this was to essentially block access to the ip-mibIP-MIB part of the OID tree.

We tried various versions of:

view systemview excluded ip.ipAddressIfIndex.ipv4
view systemview excluded ip.ipAddressType.ipv4
view systemview excluded .1.3.6.1.2.1.4.20.1

and so on to no effect. Next we did the obvious test – block all client access. No change – snmpd was still consuming 100% of a cpu all the time. Now what?

Unfortunately we were not on Solaris so no DTrace. However, strace did provide the clues needed to make progress. The strace of the snmpd process was almost entirely ioctl calls to the interface devices to fetch the IP addresses associated with them. These calls themselves were not so much the issue as the response time of the ioctl calls were in microseconds. A continuous snmpget was executed against a system while the strace was generated and the strace output compared to client experience. Client time outs lined up exactly with when the ioctl “storm” would start. As the system time consumed was tiny by snmpd during this, the issue had to be in the user-land component of net-snmp.

One thing the strace showed was snmpd was almost alway determining all the IPs on each interface in what looked like a very tight loop. There was perhaps two to three seconds between each storm. In addition, each subsequent ioctl call involved more and more IPs per call – almost displaying an O(n^2) behavior as the list of IPs in the previous call were then being listed again with only the addition of another IP. So now we knew snmpd was, for some reason, always rebuilding the .1.3.6.1.2.1.4 tree and doing it a nasty way.

The first thing that jumped to mind was somewhere the caching we assumed Net-SNMP would do on the ip table was misconfigured. We attacked the nsCacheTable and its ilk from all directions. No luck and no change in behavior. At this point a more thorough web search found two similar postings describing high snmpd cpu usage. One for system with a very large BGP routing table and another for a system with thousands of VLANs. Neither indicated a solution though the BGP postings indicated that sorting within snmpd was not very efficient. At this point your author started to change some of the cache constants within the net-snmp source code to see if the polling cycle on the interfaces would change. It did not.

Time to turn on debug :
The snmpd daemon that comes with Net-SNMP has a pretty thorough debug mode. So thorough in fact we went to that step last as in our environment the full debug mode generates about 75MB of log data every five seconds (remember all those ip addresses). We had tried various documented methods to only have it run debug on certain sub-modules like the IP-MIB, but we could not find an actual working command line option. The documented examples on the net-snmp wiki pages did not work at all on our build for some reason. So we were forced to deal with the fire hose of logging everything.

With full debug on (-DALL) snmpd was started. Five seconds or so was all that was needed to generate a trace that would help us track down the issue. The net-snmpd agent code is full of logging code like below:

DEBUGMSGTL(("access:ipaddress:container", "processing %d interfaces\n", interfaces));

which will generate a message like:

trace: _netsnmp_ioctl_ipaddress_container_load_v4(): ip-mib/data_access/ipaddress_ioctl.c, 171:
access:ipaddress:container: processing 4 interfaces

This makes it very easy to find where in the code the message was generated and simplifies tracing the application (relatively – there are 100s of modules). So at this point we did the tedious task of simply walking the log file in parallel with walking the code, focusing obviously on the interface and IP portions. Along the way numerous little hooks and debug messages were dropped into the code and snmpd recompiled. This helped us along the way to at least realizing the real problem was much larger than what we wanted to tackle. Essentially, the table access and cache components just will not work when one has 1000s of IP addresses. Best we could tell, snmpd is resorting the IP oid tree after each and every IP address it comes across. If you follow the snippets of debug out put below you will see how it does a table compare after each and every IP on a small test system. On our production box with over 7000 IPs snmpd is spending all its time doing index compares. The full debug snippet is available if anyone wants it.

Realizing there was no way for us to quickly and safely modify either the cache code which honestly, we couldn’t figure out why the IP table was called as often as it was. Even interface stats (.1.2.1.3.6.1.2.1.2.2) were being refreshed at a much higher refresh rate than nsCacheTable would imply] or the table and index code we decided to cut the monster off at the head – the call into the “access:ipaddress:container” portion of the code. Through a bit of debugging your author determined the head of the monster was the _netsnmp_ioctl_ipaddress_container_load_v4 function in the ip-mib/ipAddressTable/ipAddressTable_interface.c source file. This function seemed to lead to the discovery of all the IP addresses associated with an interface and hence all the sorting, indexing, and other madness we were experiencing.

Solution
The solution is neither pretty nor elegant and is a bit of the nuclear fly-swatter type. The function identified above loops over all the physical interfaces with IP addresses assigned. In our case that is the loopback and a set of bonded interfaces on various VLANs. Lucky for us, the 1000s of IPs we have on each server are on a set of dedicated bonds on their own VLANs. These are completely separate from the base server VLAN and other “user experienced” addresses. In addition, there is nothing related to these addresses in SNMP that we have ever used or need to use via snmp. As we determined, our monitoring software didn’t even know these IPs existed nor really ever hit anything in the IP-MIB table. All it ever knew or cared about was the base IP assigned to each bonded interface.

So in our case, the obvious solution was to just not even process these bonded interfaces to determine what IP addresses were on them. We found some references in the code to essentially leverage ablack list but we could not figure out how to use it. As such we did the next best thing – modified the code of the loop in the _netsnmp_ioctl_ipaddress_container_load_v4 to just skip a set of known bonds – essentially do something like:
[code language=”css”]

if (strcmp(ifrp->ifr_name,"bond1.16") == 0) { DEBUGMSGTL(("access:ipaddress:skipping", " interface %d, %s\n", i, ifrp->ifr_name)); continue; }
[/code]

The full diff is below. Note the code was written so it would be easy to generate a patch based on our automation tools (i.e. if new bonds come and go) and to be obvious to folks without much C experience to know what to change. Not super efficient but good enough. After making this change snmpd CPU usage went from a continuous 100 percent to under 0.5 percent. Better yet – no more timeouts! All we lost was access to the portion go the IP table for the excluded interfaces. Something we didn’t use anyways.

We still don’t know why snmpd was determined to rebuild the ip-mib continuously. As our use case is probably pretty rare it is not surprising there are so few reports of this behavior.

[code language=”css”]

— ipaddress_ioctl.c.new +++ ipaddress_ioctl.c.orig @@ -165,11 +165,6 @@ DEBUGMSGTL(("access:ipaddress:container", " interface %d, %s\n", i, ifrp->ifr_name)); – /* Ops was here */ – – DEBUGMSGTL(("access:ipaddress:containerSPNew", – " interface %d, %s\n", i, ifrp->ifr_name)); – if (AF_INET != ifrp->ifr_addr.sa_family) { DEBUGMSGTL(("access:ipaddress:container", " skipping %s; non AF_INET family %d\n", @@ -177,36 +172,6 @@ continue; } – /* Working around issue with 6000 IPs on a host – * So we exclude known problem interfaces – */ – – if (strcmp(ifrp->ifr_name,"bond1.16") == 0) { – DEBUGMSGTL(("a ccess:ipaddress:skipping", – " interface %d, %s\n", i, ifrp->ifr_name)); – continue; – } – – if (strcmp(ifrp->ifr_name,"bond1.20") == 0) { – DEBUGMSGTL(("access:ipaddress:skipping", – " interface %d, %s\n", i, ifrp->ifr_name)); – continue; – } – – if (strcmp(ifrp->ifr_name,"bond1.24") == 0) { – DEBUGMSGTL(("access:ipaddress:skipping", – " interface %d, %s\n", i, ifrp->ifr_name)); – continue; – } – – if (strcmp(ifrp->ifr_name,"bond1.28") == 0) { – DEBUGMSGTL(("access:ipaddress:skipping", – " interface %d, %s\n", i, ifrp->ifr_name)); – continue; – } – – /* End Ops Hack */ – /* */ entry = netsnmp_access_ipaddress_entry_create();

[/code]

3 thoughts on “Very High snmpd CPU Usage and 1000s of IP Addresses

  1. Having the same problem, 1000 of IPs on VLANs, and snmpd is always at 100%
    I think I will need to reinstall net-snmp from sources and apply your patch.
    Thanks for sharing!

  2. Pingback: 本当に恐ろしいsnmpd | IT技術情報局

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.