Thanks for the reply. It is

crisponions · ‎03-08-2016

Last week a nat rule got set up on one of our ASA's which caused it to start replying to all ARP requests on its inside interface. The result was a very slow meltdown. Some systems could access servers some couldn't, had more of an issue with our VM's than physical servers.

The reports we received and from what was observed made it appear there was either an issue with the core switch or server switch stack. We eventually found the problem in a wiretrace, seeing the ASA responding to all ARP requests.

What I can't figure out is... my arp tables all looked correct on my switches. If I would have seen my switch arp tables filled with many entries of the same mac address we could have more easily/quickly identified the problem. Since this wasn't the case (arp-->mac was correct) it has made me question my knowledge! Can anyone provide insight as to why this was the case?

Peter Paluch · ‎03-08-2016

Hello,

What I can't figure out is... my arp tables all looked correct on my switches. If I would have seen my switch arp tables filled with many entries of the same mac address we could have more easily/quickly identified the problem.

As we are talking about Proxy ARP wreaking havoc with your network, and ARP's scope is always a single broadcast domain, I assume that your clients were placed in the same VLAN as the ASA. Am I correct in this assumption? If that is so then the ARP tables on your switches were not that relevant. You see, if the clients and the ASA were in the same VLAN then the switches were only relevant as Layer2 devices. Their ARP tables would be relevant if these switches were performing routing between VLANs - but if the ARP was at the core at the issue then clearly the issue was contained to a single VLAN only. If the switches did not need to resolve the clients' IPs into MAC addresses, their ARP tables would not be updated and could contain old entries.

That's my first guess. However, it would actually be helpful to understand the logical topology of your network - where are the clients, where is the ASA, and what are the routers and L3 networks inbetween so that the scope of the ARP issue can be properly assessed.

Best regards,
Peter

crisponions · ‎03-08-2016

Thanks for the reply. It is a flat network, all clients, servers, asa in the same vlan. I was troubleshooting from another location originally, so there would have been routing between the vlan interface (core switch) and the network I was on.

I started with wire captures remotely on uplinks from the core switch, and I could see the servers/devices replying to pings but the core switch never forwarding them on (routing them back to me), I would then check the arp table on the core switch and everything would look correct. At this point I assumed it was an issue with the core switch.

Ive attached a simple network diagram.

Peter Paluch · ‎03-10-2016

Hello,

I am sorry for responding late.

Sometimes, the simplest explanation tends to be the correct one... I am thinking about the fact that the default ARP timeout on Cisco IOS-based devices is 4 hours, significantly longer than WIndows or Linux operating systems. If the problem occurred and was resolved during this time frame, it is possible that the switches had their ARP entries cached and didn't need to refresh them, which would explain that you saw the entries being correct. Would this fit your experience?

Best regards,
Peter

proxy arp war story - what happened?