cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2290
Views
5
Helpful
9
Replies

Error in Project : ERROR_HEALTH_CHECK_TIMER_EXPIRED

John Palmason
Level 4
Level 4

I wonder if anybody else is having this issue, I am trying to perfect my PNP config's and I have been added features as I go along to introduce complexity to my project to help me prepare for a real deployment. 

I have 8 devices in my lab, 3 routers and 5 switches (including a 4 member switch stack).  I have been successfully able to deploy image upgrades and basic configuration templates.  I am now moving on to assigning ip addresses' and non-vlan1 configurations + etherchannel configurations.

I am now running into this error now that I have changed my switching configuration to include  management vlan with basic default route.  I have the pnp startup-vlan XX command on the upstream switch.  One of my switches get configured correctly with the static management vlan and ip. The other ones come back with and error in APIC-EM controller about not being able to manage the device due to a time out (see error below).  I found a bug ID with this error but I would suggest that it doesn't fit the problem I am seeing.

https://bst.cloudapps.cisco.com/bugsearch/bug/CSCuy57714/?referring_site=bugquickviewredir

Error message in the project page:

#######################################

Failed health check since device is stuck in non-terminal state PROVISIONING_CONFIG for more than threshold time: 0 hours, 16 minutes, 0 seconds

#######################################

After waiting for the configuration to deploy, if I log in the console I see this message and the job error's out in APIC-EM. If I enter no to the prompt I can see the configuration was delivered to the switch with the exception of enabling SSH. 

#######################################

         --- System Configuration Dialog ---

Enable secret warning

----------------------------------

In order to access the device manager, an enable secret is required

If you enter the initial configuration dialog, you will be prompted for the enable secret

If you choose not to enter the intial configuration dialog, or if you exit setup without setting the enable secret,

please set an enable secret using the following CLI in configuration mode-

enable secret 0 <cleartext password>

----------------------------------

Would you like to enter the initial configuration dialog? [yes/no]:

Has anybody else noticed this behaviour?

JP

1 Accepted Solution

Accepted Solutions

ok.  sounds like a config file issue.

you should be able to ping the default gateway from the switch as it is in the same vlan?  does that work?

If not, need to check that the vlan is being trunked between the two switches.

View solution in original post

9 Replies 9

aradford
Cisco Employee
Cisco Employee

Hi JP,

there are a couple of reasons this might occur.

What model switch are you using and which version of code is it booting with?

Also, are you able to share a sanitised version of the config please?

Also, when doing testing, it is important that you "clean up" the device properly...

Here is a "full clean up" (as well as wr er) for a stack. I am going to post a PnP stacking blog later today.

# remove the certificates on active and standby

delete /force nvram:*.cer

delete /force stby-nvram:*.cer

# remove vlan data based from active and standby

delete /force flash-1:vlan.dat

delete /force flash-2:vlan.dat

#  remove certificates from memory.  NOTE: you will not be able to SSH after this

conf t

crypto key zeroize

yes

end

Hello Adam,  after testing your clean up suggestions I am still getting to the same point in the deployment.  It seems that during the cut over to the vlan the pnp server gets disconnected and the the pnp process stops. Can you please review the console logs and tell me what you see?

The hardware: WS-C3650-48PD - cat3k_caa-universalk9.SPA.03.07.04.E.152-3.E4.bin

Nov  7 23:40:21.884: %PNPA-DHCP Op-43 Msg: OK to process message

Nov  7 23:40:21.884: XML-UPDOWN: PNPA_DHCP_OP43 XML Interface(102) UP. PID=399

Nov  7 23:40:21.884: %PNPA-DHCP Op-43 Msg: _pdoon.1.ntf.don=399

Nov  7 23:40:21.891: %PNPA-DHCP Op-43 Msg: _pdoop.1.org=[A1D;B2;K4;I172.X.X.X;J80]

Nov  7 23:40:21.891: %PNPA-DHCP Op-43 Msg: _pdgfa.1.inp=[B2;K4;I172.X.X.X;J80]

Nov  7 23:40:21.891: %PNPA-DHCP Op-43 Msg: _pdgfa.1.B2.s12=[ ipv4 ]

Nov  7 23:40:21.891: %PNPA-DHCP Op-43 Msg: _pdgfa.1.K4.htp=[ transport http ]

Nov  7 23:40:21.891: %PNPA-DHCP Op-43 Msg: _pdgfa.1.Ix.srv.ip.rm=[ 172.X.X.X ]

Nov  7 23:40:21.891: %PNPA-DHCP Op-43 Msg: _pdgfa.1.Jx.srv.rt.rm=[ port 80 ]

Nov  7 23:40:21.891: %PNPA-DHCP Op-43 Msg: _pdoop.1.ztp=[pnp-zero-touch] host=[] ipad=[172.X.X.X] port=80

Nov  7 23:40:21.891: %PNPA-DHCP Op-43 Msg: _pors.done=1

Nov  7 23:40:21.891: %PNPA-DHCP Op-43 Msg: _pdokp.1.now=PNPA_DHCP_OP43 pid=399

Nov  7 23:40:21.891: XML-UPDOWN: PNPA_DHCP_OP43 XML Interface(102) SHUTDOWN(101). PID=399

Nov  7 23:40:21.936: %DHCP-6-ADDRESS_ASSIGN: Interface Vlan107 assigned DHCP address 172.X.X.X, mask 255.255.255.0, hostname

Nov  7 23:40:37.689: AUTOINSTALL: Obtain siaddr 172.16.X.X (as config server)

Nov  7 23:40:47.696: %PNP-6-HTTP_CONNECTING: PnP Discovery trying to connect to PnP server http://172.X.X.X:80/pnp/HELLO

Nov  7 23:40:47.707: %PNP-6-HTTP_CONNECTED: PnP Discovery connected to PnP server http://172.X.X.X:80/pnp/HELLO

Nov  7 23:40:48.724: %PNP-6-PROFILE_CONFIG: PnP Discovery profile pnp-zero-touch configured

%Error opening tftp://172.16.33.139/network-confg (Timed out)

Nov  7 23:41:23.212: %SYS-6-CLOCKUPDATE: System clock has been updated from 23:41:19 UTC Mon Nov 7 2016 to 23:41:23 UTC Mon Nov 7 2016, configured from console by console.

Nov  7 23:41:31.992: %PKI-4-NOCONFIGAUTOSAVE: Configuration was modified.  Issue "write memory" to save new IOS PKI configuration

%Error opening tftp://172.16.33.139/cisconet.cfg (Timed out)

%Error opening tftp://172.16.33.139/router-confg (Timed out)

%Error opening tftp://172.16.33.139/ciscortr.cfg (Timed out)

000038: Nov  7 23:44:14 UTC: %SYS-5-LOG_CONFIG_CHANGE: Buffer logging: level informational, xml disabled, filtering disabled, size (65000)

000039: Nov  7 23:44:14 UTC: %SYS-5-LOG_CONFIG_CHANGE: Monitor logging: level informational, xml disabled, filtering disabled

000040: Nov  7 15:44:14 PST: %SYS-6-CLOCKUPDATE: System clock has been updated from 23:44:14 UTC Mon Nov 7 2016 to 15:44:14 PST Mon Nov 7 2016, configured from console by vty1.

000041: Nov  7 15:44:14 PST: %SYS-6-CLOCKUPDATE: System clock has been updated from 15:44:14 PST Mon Nov 7 2016 to 15:44:14 PST Mon Nov 7 2016, configured from console by vty1.

000042: Nov  7 15:44:14 PST: %SW_VLAN-6-VTP_DOMAIN_NAME_CHG: VTP domain name changed to company.corp.

000043: Nov  7 15:44:15 PST: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan107, changed state to down

000047: Nov  7 15:44:20 PST: %PARSER-4-BADCFG: Unexpected end of configuration file.

000048: Nov  7 15:44:20 PST: %SYS-6-LOGGINGHOST_STARTSTOP: Logging to host 172.1X.X.X port 514 started - CLI initiated

000049: Nov  7 15:44:20 PST: %SYS-6-LOGGINGHOST_STARTSTOP: Logging to host 172.X.X.X port 514 started - CLI initiated

000050: Nov  7 15:44:23 PST: %SYS-6-LOGGINGHOST_STARTSTOP: Logging to host 172.X.X.X port 514 started - CLI initiated

000051: Nov  7 15:44:45 PST: %LINEPROTO-5-UPDOWN: Line protocol on Interface Vlan107, changed state to up

000052: Nov  7 15:44:49 PST: %PNPA-DHCP Op-43 Msg: _papdo.2.eRr.ena

000053: Nov  7 15:44:49 PST: %PNPA-DHCP Op-43 Msg: _pdoon.2.eRr.pdo=-1

Here is the config:

service nagle

no service config

no service pad

service tcp-keepalives-in

service timestamps debug datetime localtime show-timezone

service timestamps log datetime localtime show-timezone

service password-encryption

service sequence-numbers

!

hostname STG-BR-S-01

!

boot-start-marker

boot-end-marker

!

!

vrf definition Mgmt-vrf

!

address-family ipv4

exit-address-family

!

address-family ipv6

exit-address-family

!

logging buffered 65000 informational

logging monitor informational

enable secret 5 XXXXXXX

!

username swadmin privilege 15 secret 5 XXXXXXXXXX

aaa new-model

!

!

aaa authentication login default group tacacs+ local

aaa accounting update periodic 1

aaa accounting exec default start-stop group tacacs+

aaa accounting commands 15 default start-stop group tacacs+

aaa accounting connection default start-stop group tacacs+

no aaa accounting system guarantee-first        

!

!

!

aaa session-id common

clock timezone PST -8 0

clock summer-time PDT recurring

no ip domain-lookup

ip domain-name bcferries.corp

!

!

login on-failure log

vtp domain XXXXXXX.corp

vtp mode transparent

authentication mac-move permit

port-channel load-balance src-ip

spanning-tree mode pvst

spanning-tree extend system-id

!

redundancy

mode sso

interface Vlan107

ip address X.X.X.X 255.255.255.0

ip helper-address X.X.X.X

no ip redirects

no ip route-cache

ip default-gateway X.X.X.X

ip forward-protocol nd

no ip http server

no ip http secure-server

ip ssh version 2

ip scp server enable

!

!

ip sla enable reaction-alerts

logging source-interface Vlan1

logging host X.X.X.X

!

!

snmp ifmib ifindex persist

tacacs server ACS-01

address ipv4 X.X.X.X

key 7 XXXXXXXX

tacacs server ACS-02

address ipv4 X.X.X.X

key 7 XXXXXXXXX

!

!

!

line con 0

exec-timeout 30 0

privilege level 15

stopbits 1

line vty 0 4

exec-timeout 30 0

privilege level 15

transport input ssh

line vty 5 15

exec-timeout 30 0

privilege level 15

transport input ssh

!

ntp server X.X.X.X

Hi JP

you need to have "end" as the last line of the config file :-)

this error message is an indication

000047: Nov  7 15:44:20 PST: %PARSER-4-BADCFG: Unexpected end of configuration file.

Let me know if that resolves the problem?

Adam

Thanks Adam.

Putting end has corrected the error message: 000047: Nov  7 15:44:20 PST: %PARSER-4-BADCFG: Unexpected end of configuration file.


But I am still left with the original error, the deploy times out after 15 mins.

Any other suggestions?

Is the credential in the config file the same one as APIC-EM would use to discover the device?

username swadmin privilege 15 secret 5 XXXXXXXXXX

Also, do you have vlan 1 configured on this device? 

I see you are using it as the logging source?

logging source-interface Vlan1


If pnp startup-vlan does the right thing, you should have vlan 1 shutdown, vlan 107 enabled, and any active interfaces in vlan 107?


I can see vlan 107 being created in the pnp logs.


Adam


Also, do you have vlan 1 configured on this device? 

I see you are using it as the logging source?

logging source-interface Vlan1


If pnp startup-vlan does the right thing, you should have vlan 1 shutdown, vlan 107 enabled, and any active interfaces in vlan 107?


I can see vlan 107 being created in the pnp logs.


Adam


In my testing, I keep the configuration the same but only changed the section under vlan 107 to ip address DHCP instead of a static.  This change completes with a state of "provisioned".

The thing I am noticing is that my switches with L3 code are working perfectly with this configuration but my L2 ones are all failing.   I have tired testing the reach-ability of the device and I am missing something.  I can't route off this device even with the ip route 0.0.0.0 0.0.0.0 X.X.X.X command entered, I have also configured the default gateway command. 

The vlan1 as a logging source I think is a mistake and I will remove that line.

JP

ok.  sounds like a config file issue.

you should be able to ping the default gateway from the switch as it is in the same vlan?  does that work?

If not, need to check that the vlan is being trunked between the two switches.