cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
533
Views
0
Helpful
1
Replies

High volume of CUOM alerts after reboot of server?

Esto
Level 1
Level 1

We reboot our Unity Server this morning which was approimately a 3 minute outage. However, CUOM generated approximately 48-50 alerts for this quick bounce. This seems a bit excessive for something that brief. It generated numerous Service Down messages and "TemperatureHigh" messages (All categorized as Critical messages) . I am not sure why the Temperature messages were generated, there are no other servers that are having Temperature issues. So I have 2 questions; is there and reason why these Temerature Messages would be generated and how can we streamline the messaging so we are not inundated with too many alerts. iLO shows "server reset, server restored" and that seems much more appropriate.

 

A secondary question regarding the Temperature Alerts: 127 Degrees Celsius is almost 260 Degrees Fahrenheit. Unless I am reading this incorrectly, that seems very odd that we would get a message like this for a server that is not experiencing Temperature issues.

SEVERITY               = Critical

MANAGED OBJECT         = X.X.X.X

EVENT DESCRIPTION       = TemperatureHigh::Component= TEMP-; TemperatureSensorLocation= cpu; TemperatureCelsiusThreshold= 127; RelativeTemperatureThreshold= 10 %; TemperatureCelsius= 255 DEGC

1 Reply 1

Esto
Level 1
Level 1

Over 2 months and never got a response on this one. I am currently customizing the alerting. But I am still not sure how to tweak the frequency (ie: event generating multiple smae alert messages). The temp issue still doesnt make alot of sense either, there were no temp issues with this server