cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
749
Views
0
Helpful
1
Replies

Server lost redundancy - [F0174]P2_MEM01_MEMHOT: A PSU fault/Thermal

RobertPivac
Level 1
Level 1

Hi People, 

 

I'm getting this error on UCS C220-M4S. Has anyone faced this issue and what is the workaround?

 
Please refer to the screenshots attached.
 
[F0174][cleared][equipment-inoperable][sys/rack-unit-1/board/cpu-1] P1_PROCHOT: A PSU fault/Thermal event/MCE on the CPU might have occurred. Please check PECI over DMI, PMBUS_ALERT and PX_TEMP sensors to determine the fault source.: Cleared
1 Reply 1

Kirk J
Cisco Employee
Cisco Employee

The various sensor names/tests get triggered when the reporting system is no longer responding.

The majority of the time, this is when there is a bad DIMM, and it triggers a system freeze, CATERRN

You might want to check your SEL logs to see if you have recently had a spat of correctable, or an uncorrectable error logged on any of your DIMM slot.  Usually takes a power cycle, or power drain to get out of that state.

Obvious this DIMM errors would require a DIMM replacement.

 

Kirk...