cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
12896
Views
18
Helpful
8
Replies

ISE Monitoring Best Practice

chunhwon
Cisco Employee
Cisco Employee

Hi All,

My customer has encountered some service port down in ISE which leads to service outage.

1.) Based on the link below there is a way to monitor ISE process status change by SNMP traps. Do we have the list of processes being monitored?

https://www.cisco.com/c/en/us/td/docs/security/ise/2-1/admin_guide/b_ise_admin_guide_21/b_ise_admin_guide_20_chapter_011000.html#id_17078

2.) In case processes are reported as down, what's recommended action to take in order for service restoration?

3.) Noticed that it's possible to issue CLI command to restart ALL the ISE services but not sure the time taken compared with rebooting the appliance, which one would be faster and more effective?

4.) For Monitoring ISE health using SNMP Polling

https://communities.cisco.com/message/256391#256391

Do we have any ISE monitoring best practice we can refer to?

Many thanks,
CH

1 Accepted Solution

Accepted Solutions

No argument on need for more remote, app-based monitoring capabilities via SNMP or API.  Yes, syslog can provide details on health and other services.  And yes, performing health monitor checks is generally best way to validate a service is working and is a standard mechanism with load balancers deployed.

However, minor correction on the comment "there is only one trap".  Here is a quick example of traps sent just at boot time that I captured from my lab server...

/Craig

View solution in original post

8 Replies 8

Arne Bier
VIP
VIP

All excellent questions. 

I have also had PSN nodes with failed application server limping along and nobody noticed (except the customers screaming!). 

There needs to be more attention given to the operational monitoring of ISE using familiar interfaces like SNMP traps, or a well curated SNMP MIB.  ISE has one SNMP trap (disk threshold). 

I think SYSLOGs are the only useful notification you'll get from ISE and you have to ensure that you send the correct ones, and then filter on the critical stuff.

I have PRTG monitoring my deployment but it really just does a ping to the nodes and reports any loss of responses.  And then if you got really fancy you could perform regular Radius authentications of an AD user.  That would perhaps test PSN<->AD - but these are just scratching the surface.  I don't believe there is any way I can know via SNMP that the applications on ISE are suffering.  I would have to infer this by trending my memory/CPU consumption and wondering what's going on. 

In most cases a reboot may be required to solve a tricky memory leak issue and it may add an extra 2 minutes to the downtime. 

hslai
Cisco Employee
Cisco Employee
  1. Adding to Arne's, see Monitor ISE Processes
  2. Firstly, to check which process is down and to confirm it not a false alarm. Secondly, to quickly review the logs and see if they giving any clues. Thirdly, to try restarting the ISE services and/or engage Cisco TAC, if needed.
  3. Some earlier ISE releases might not gracefully shutdown the ISE services before reload so I would recommend to stop ISE services before performing a reload. Since reload takes some time to restart the operating system, it's taking a bit more time than an ISE service restart but it might help clearing up some issues on the O/S level.
  4. No, there is no such best practices doc today. Many of Cisco Live presentations do give some tips and tricks on related areas.

No argument on need for more remote, app-based monitoring capabilities via SNMP or API.  Yes, syslog can provide details on health and other services.  And yes, performing health monitor checks is generally best way to validate a service is working and is a standard mechanism with load balancers deployed.

However, minor correction on the comment "there is only one trap".  Here is a quick example of traps sent just at boot time that I captured from my lab server...

/Craig

Not to belabour the point, but the example you cited is the result of the coldstart feature of the NET-SNMP server.

One can provoke this behaviour by simply enabling and disabling SNMP on the ISE CLI (snmp-server enable). 

However the value of this is questionable because it only happens when you restart the SNMP daemon.  The OS doesn't send a trap when the interface goes down (e.g. disable NIC in VMWare).  Or, you might expect a trap when it comes back up again, as a sign of life notification.

My point is that the ISE CLI only offers one configurable SNMP trap

ise-01/admin(config)# snmp-server trap ?

  dskThresholdLimit  SNMP Trap for disk threshold

I have seen a few others during an ISE server reboot that tell you that the application server is running (and a few other processes).  That is quite useful.  But, when I manually restart ise application then I don't get any traps.

It would be useful to have some examples of SNMP traps that inform us of things happening while the server is operational (i.e. not after a reboot).   E.g. when application ise fails, is an SNMP trap sent? 

Is there an authoritative list of traps that the ISE platform sends?

It's been awhile, anyone familiar with ways to monitor the services running on node? We had the application server fail on a PSN but no one noticed and I'd like to try and catch these in the future.

Arne Bier
VIP
VIP

AFAIK, there is no syslog or SNMP trap that can inform an NMS of such a process failure.

We still have to drop to the CLI to issue the command "show application status ise" to see that a process is "not running" or "initializing" etc.

And no MIB to query either, AFAIK. Nor any REST API call that could be used.  You might get away with doing a REST API call to a PSN to see if its web interface is still responding to a simple dummy HTTP request. However, that doesn't prove if RADIUS is working. So, what I have seen is customers sending synthetic RADIUS and TACACS+ requests to their important PSNs to check for health. But that's a very expensive way to monitor. A better way would be to expose the process status via SNMP or REST.  Or an SNMP trap (assuming SNMP is still working) about the death of some ISE process would be nice.

Perhaps it's coming or I missed it since I last checked. 

 

After I posted my question I stumbled across a forum post on solarwinds' site and I believe someone said the SNMP traps can be sent with some detail on Application Server/Processes, it just can't be polled, but now i can't find that forum post, lol.

I noticed you had another post about SNMP traps in ISE 2.7, did you get that sorted, and if so do you know if traps could provide those details? I'm not great with SNMP, would that be in the MIB as well?

Hi

Did you get any progress reg. monitoring the Application Server/Processes?