cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
383
Views
0
Helpful
13
Replies

Site-to-site vpn failover causing memory spike in spoke sites

PikaPika
Level 1
Level 1

Hi All. I would like to get your thoughts on the following issue i have been facing since implementing Dual ISP and failover for site-to-site vpn tunnels.

Hub Site:
ASA5515 using 9.6(4)42

Spoke sites:
Cisco ISRs using 15.1(4)M8, using voip services over the ipsec tunnel

Diagram below.

Spoke site config example:

crypto isakmp policy 10
encr <aes>
authentication pre-share
group <>
crypto isakmp key <key> address 1.1.1.1 no-xauth
crypto isakmp key <key> address 2.2.2.2 no-xauth
crypto isakmp keepalive 10 3
!
!
crypto ipsec transform-set <>
crypto ipsec df-bit clear
!
crypto map <> 10 ipsec-isakmp
set peer 1.1.1.1 default
set peer 2.2.2.2
set security-association lifetime kilobytes 1024000
set security-association lifetime seconds 28800
set transform-set <>
set pfs <>


The Dual ISP setup is working fine. When the ISP1 goes down, all the spoke site routers build tunnel to 2.2.2.2 and hub to spoke connectivity gets restored. However we observe memory spike (from Solarwinds) in almost all the spoke routers which impacts all the voice services. During high memory utilization, we can't even telnet the routers. So end up reloading the routers to put back in service. It's quite annoying because we need to engage onsite contact to reload the router every time.

Could you kindly provide some information? We hadn't encountered this issue prior to implementing Dual ISP in the Hub, so I'm wondering if it could be related to DPD or keepalives. I'm interested in hearing your thoughts and learning if others have experienced a similar issue and any potential workarounds.

Any suggestions or feedback would be greatly appreciated. Thank you!

ASA Dual ISP setupASA Dual ISP setup

 

13 Replies 13

How you use same Peer IP for two spokes???

MHM

Hi. Apologies if my diagram was confusing. I have 1.1.1.1 (not real ip) configured on OUTSIDE interface and 2.2.2.2 OUTSIDE_2 interface. In ASA, I have a crypto map applied to both the outside interfaces. For each site-to-site tunnel, we have a crypto map entry with a sequence number.

On spoke site, we define the above two ip addresses as peers. I hope that makes sense. 

 

 

In asa since it hub for two spokes use 

Acl for spoke 1

Lan behind asa 

Lan behind spoke2

Acl for spoke2

Lan behind asa

Lan behind spoke1

In spoke1 acl will be 

Lan behind asa

Lan behind spoke2 

In spoke2 acl will be 

Lan behind asa

Lan behind spoke1 

And sure you need route' 

In spoke1 route 

Lan behind spoke2 toward asa

In spoke2 route 

Lan behind spoke1 toward asa

That it

MHM

Did you config above confirm that 

MHM

In my setup, Spoke 1 doesn't talk to Spoke 2.

So in Asa:

Acl for spoke 1 -> Lan behind asa

Acl for spoke2 -> Lan behind asa


In spoke1 acl :

Spoke1 local subnet to Lan behind asa


In spoke2 acl:

Spoke2 local subnet to Lan behind asa


Route:

In spoke1, route for lan behind asa

In spoke2, route for lan behind asa

And the route in ASA is 

Route ISP1 spoke1 5

Route ISP2 spoke1 10

Route ISP1 spoke2 10

Route ISP2 spoke2 5

MHM

The route in ASA is: 

Route ISP1 1 track 1 (administrative distance set to 1 and tracking the next hop using ip sla)

Route ISP2 200 (administrative distance set to 200 and gets used if ISP1 route goes down)

 

VoIP is udp in base and asa need some config to deal with udp terminate connection

In asa 

There is timeout 

1- timeout floating conn <- this need to make asa terminated any connection when ISP is failed

2- timeout h323 <- this need to terminate tcp/udp connection

Hope this help you 

MHM

Thank you for the suggestions. I will try these cmds over the weekend.

tvotna
Spotlight
Spotlight

Very hard to tell, because 15.1(4)M8 is many many years old. Any particular reason you use this version?

It's highly unlikely that the issue happens because of VPN. It's probably related to VoIP. I think you need to collect memory stats on routers periodically to rule out slow memory leak and install a console server to connect to one of them when failover happens and collect stats again. Commands:

show memory statistics
show memory statistics history
show memory allocating-process totals
show proc memory sorted holding
show region

to begin with.

 

 

Thanks for your response and suggestions. We are using Cisco 38xx or 39xx routers, for E1 to H323 voip calls in a closed network environment and this ios have been pretty stable for years. Hence we didn't move to newer versions.

These cmds seems very helpful and i will collect the outputs when we encounter the issue again. From recent issue, i did collect output of following cmds and attaching here.

show memory summary

sh processes memory sorted

Strange thing is, i am seeing CCSIP_SPI_CONTROL process holding most memory even though we are not using SIP.

 

 

Right, if you search CCSIP_SPI_CONTRO on cisco.com you'll find enormous number of memory leak bugs. What you see doesn't look normal:

PID TTY Allocated Freed Holding Getbufs Retbufs Process
384 0 2589217900 2017306092 283503608 0 0 CCSIP_SPI_CONTRO
1 0 202021948 358168 201761856 0 0 Chunk Manager

The process is still running if you are not using SIP, so it can still leak memory. Such bugs are rare, but this is not impossible. Also, the process listens on TCP/5060, UDP/5060 and on SIP-TLS port. Your network is closed, but I'd still disable corresponding ports if you don't need SIP. Refer to:

https://www.cisco.com/c/en/us/support/docs/csa/cisco-sa-sip-Cv28sQw2.html
https://www.cisco.com/c/en/us/support/docs/csa/cisco-sa-20190925-sip-dos.html

Chunk Manager also stands out. Maybe

show chunk summary

will give you a clue.

You can also upgrade one of routers to a more recent version and see if that helps.

 

 

 

Thank you for the feedback and suggestions. I will try to upgrade one of the router to newer ios and also disable the SIP ports as mentioned in the doc. Just preparing docs and changes to submit for approval.