Site-to-site vpn failover causing memory spike in spoke sites

PikaPika · ‎04-23-2024

Hi All. I would like to get your thoughts on the following issue i have been facing since implementing Dual ISP and failover for site-to-site vpn tunnels.

Hub Site:
ASA5515 using 9.6(4)42

Spoke sites:
Cisco ISRs using 15.1(4)M8, using voip services over the ipsec tunnel

Diagram below.

Spoke site config example:

crypto isakmp policy 10
encr <aes>
authentication pre-share
group <>
crypto isakmp key <key> address 1.1.1.1 no-xauth
crypto isakmp key <key> address 2.2.2.2 no-xauth
crypto isakmp keepalive 10 3
!
!
crypto ipsec transform-set <>
crypto ipsec df-bit clear
!
crypto map <> 10 ipsec-isakmp
set peer 1.1.1.1 default
set peer 2.2.2.2
set security-association lifetime kilobytes 1024000
set security-association lifetime seconds 28800
set transform-set <>
set pfs <>

The Dual ISP setup is working fine. When the ISP1 goes down, all the spoke site routers build tunnel to 2.2.2.2 and hub to spoke connectivity gets restored. However we observe memory spike (from Solarwinds) in almost all the spoke routers which impacts all the voice services. During high memory utilization, we can't even telnet the routers. So end up reloading the routers to put back in service. It's quite annoying because we need to engage onsite contact to reload the router every time.

Could you kindly provide some information? We hadn't encountered this issue prior to implementing Dual ISP in the Hub, so I'm wondering if it could be related to DPD or keepalives. I'm interested in hearing your thoughts and learning if others have experienced a similar issue and any potential workarounds.

Any suggestions or feedback would be greatly appreciated. Thank you!

ASA Dual ISP setup

MHM Cisco World · ‎04-23-2024

How you use same Peer IP for two spokes???

MHM

PikaPika · ‎04-24-2024

Hi. Apologies if my diagram was confusing. I have 1.1.1.1 (not real ip) configured on OUTSIDE interface and 2.2.2.2 OUTSIDE_2 interface. In ASA, I have a crypto map applied to both the outside interfaces. For each site-to-site tunnel, we have a crypto map entry with a sequence number.

On spoke site, we define the above two ip addresses as peers. I hope that makes sense.

MHM Cisco World · ‎04-24-2024

In asa since it hub for two spokes use

Acl for spoke 1

Lan behind asa

Lan behind spoke2

Acl for spoke2

Lan behind asa

Lan behind spoke1

In spoke1 acl will be

Lan behind asa

Lan behind spoke2

In spoke2 acl will be

Lan behind asa

Lan behind spoke1

And sure you need route'

In spoke1 route

Lan behind spoke2 toward asa

In spoke2 route

Lan behind spoke1 toward asa

That it

MHM

MHM Cisco World · ‎04-24-2024

Did you config above confirm that

MHM

PikaPika · ‎04-24-2024

In my setup, Spoke 1 doesn't talk to Spoke 2.

So in Asa:

Acl for spoke 1 -> Lan behind asa

Acl for spoke2 -> Lan behind asa

In spoke1 acl :

Spoke1 local subnet to Lan behind asa

In spoke2 acl:

Spoke2 local subnet to Lan behind asa

Route:

In spoke1, route for lan behind asa

In spoke2, route for lan behind asa

MHM Cisco World · ‎04-24-2024

And the route in ASA is

Route ISP1 spoke1 5

Route ISP2 spoke1 10

Route ISP1 spoke2 10

Route ISP2 spoke2 5

MHM

PikaPika · ‎04-24-2024

The route in ASA is:

Route ISP1 1 track 1 (administrative distance set to 1 and tracking the next hop using ip sla)

Route ISP2 200 (administrative distance set to 200 and gets used if ISP1 route goes down)

MHM Cisco World · ‎04-24-2024

VoIP is udp in base and asa need some config to deal with udp terminate connection

In asa

There is timeout

1- timeout floating conn <- this need to make asa terminated any connection when ISP is failed

2- timeout h323 <- this need to terminate tcp/udp connection

Hope this help you

MHM

PikaPika · ‎04-25-2024

Thank you for the suggestions. I will try these cmds over the weekend.

tvotna · ‎04-24-2024

Very hard to tell, because 15.1(4)M8 is many many years old. Any particular reason you use this version?

It's highly unlikely that the issue happens because of VPN. It's probably related to VoIP. I think you need to collect memory stats on routers periodically to rule out slow memory leak and install a console server to connect to one of them when failover happens and collect stats again. Commands:

show memory statistics
show memory statistics history
show memory allocating-process totals
show proc memory sorted holding
show region

to begin with.

PikaPika · ‎04-24-2024

Thanks for your response and suggestions. We are using Cisco 38xx or 39xx routers, for E1 to H323 voip calls in a closed network environment and this ios have been pretty stable for years. Hence we didn't move to newer versions.

These cmds seems very helpful and i will collect the outputs when we encounter the issue again. From recent issue, i did collect output of following cmds and attaching here.

show memory summary

sh processes memory sorted

Strange thing is, i am seeing CCSIP_SPI_CONTROL process holding most memory even though we are not using SIP.

tvotna · ‎04-24-2024

Right, if you search CCSIP_SPI_CONTRO on cisco.com you'll find enormous number of memory leak bugs. What you see doesn't look normal:

PID TTY Allocated Freed Holding Getbufs Retbufs Process
384 0 2589217900 2017306092 283503608 0 0 CCSIP_SPI_CONTRO
1 0 202021948 358168 201761856 0 0 Chunk Manager

The process is still running if you are not using SIP, so it can still leak memory. Such bugs are rare, but this is not impossible. Also, the process listens on TCP/5060, UDP/5060 and on SIP-TLS port. Your network is closed, but I'd still disable corresponding ports if you don't need SIP. Refer to:

https://www.cisco.com/c/en/us/support/docs/csa/cisco-sa-sip-Cv28sQw2.html
https://www.cisco.com/c/en/us/support/docs/csa/cisco-sa-20190925-sip-dos.html

Chunk Manager also stands out. Maybe

show chunk summary

will give you a clue.

You can also upgrade one of routers to a more recent version and see if that helps.

PikaPika · ‎04-25-2024

Thank you for the feedback and suggestions. I will try to upgrade one of the router to newer ios and also disable the SIP ports as mentioned in the doc. Just preparing docs and changes to submit for approval.