02-27-2017 06:07 AM - edited 03-01-2019 04:36 AM
Hi,
I am unable to access the APIC-EM gui after an attempted upgrade to 1.4. I cannot access the development console either - it just hangs on login. This is the second time it has happened, I previously tried an upgrade and hit the same issue, I had to restore to a previous snapshot to get it back up and running. This time I hit the same problem again. I can see a number of services listed as harvested. I have completed multiple reboots ( its a 3 node cluster - each node has 32G of RAM and 500G hdd space - disk IO is 316 MB/s).
The RCA file is 238mb - can I upload it somewhere? - I would prefer not to reset it back to defaults as I have over 2000 devices added.
All the services seem to be running
[Mon Feb 27 11:08:49 UTC] grapevine@10.36.12.145 (grapevine-root-1) ~
$ sudo service grapevine status
grapevine is running
grapevine_capacity_manager RUNNING pid 4138, uptime 0:53:36
grapevine_capacity_manager_lxc_plugin RUNNING pid 4143, uptime 0:53:36
grapevine_cassandra RUNNING pid 3273, uptime 0:54:26
grapevine_client RUNNING pid 3268, uptime 0:54:26
grapevine_coordinator_service RUNNING pid 3282, uptime 0:54:26
grapevine_dlx_service RUNNING pid 3279, uptime 0:54:26
grapevine_log_collector RUNNING pid 3283, uptime 0:54:26
grapevine_root RUNNING pid 4154, uptime 0:53:35
grapevine_supervisor_event_listener RUNNING pid 3267, uptime 0:54:26
grapevine_ui RUNNING pid 3434, uptime 0:54:25
reverse-proxy=4.0.2.509 RUNNING pid 3272, uptime 0:54:26
router=4.0.2.509 RUNNING pid 3277, uptime 0:54:26
(grapevine)
[Mon Feb 27 11:12:46 UTC] grapevine@10.36.12.145 (grapevine-root-1) ~
$ dd if=/dev/zero of=/tmp/foo bs=1M count=512 conv=fdatasync
512+0 records in
512+0 records out
536870912 bytes (537 MB) copied, 1.70002 s, 316 MB/s
(grapevine)
[Mon Feb 27 11:14:24 UTC] grapevine@10.36.12.145 (grapevine-root-1) ~
02-27-2017 07:59 AM
I suspect all of you services did not start. Did you remove the hosts from the cluster prior to upgrade?
You can try running /home/grapevine/bin/reset_grapevine. Answer no to all the prompts and wait about an hour.
Thanks
02-27-2017 02:15 PM
Hi Nicholas,
I tried that, waited a few hours but still hit the same issue - I'll run it again to see if it makes a difference.
Regards,
Brian
02-27-2017 03:10 PM
Brian,
If this is a single host cluster then here is a shot in the dark.
Log in as grapevine
cd bin
./harvest_all_clients
grape config update enable_policy true
./grow_all_services
Wait
02-27-2017 03:19 PM
Double reply
I re-read the thread and see that it was a multi host cluster. I was informed we do not support upgrades in multi-host configuration. (I have not found this documented).
From the 1.4 release notes though:
In case a failure occurs on a multi-host cluster during any software updates (Linux files) and you have not increased the idle timeout using the GUI, then perform the following steps:
1) Log into each host and enter the following command: $ sudo cat /proc/net/xt_recent/ROGUE | awk '{print $1}’
Note: This command will list all IP addresses that have been automatically blocked by the internal firewall because requests from these IP addresses have exceeded a predetermined threshold.
2) If the command in Step 1 returns an IP address, then perform a reboot on the host where the above command has been entered (same host as the user is logged in).
Note: The hosts should be rebooted in a synchronous order and never two hosts rebooted at the same time.
After the host or hosts reboot, upload the software update package file to the controller again using the GUI.
02-28-2017 05:48 AM
Hi Nicholas,
I tried the above, no IP's were listed. I also shutdown all the services and then hosts, I brought them back up in the reverse order as per the 1.4 recommendations. Then I brought back up the services. I can get to the development gui now and the fault seems to be with the Task service which is constantly failing, even if I try to grow it manually. I tried resetting grapevine again but the service still fails
Service could not be started. Refer to the service logs for more details (service=task-service, version=4.1.2.37, client_id=9361a77b-0968-41ed-bdd6-e39b276b18f3)
Where can I view the service logs?
02-28-2017 07:08 AM
I suggest opening a case at this point. I have seen a few other upgrade problems and it is best to track them.
Thanks!
02-27-2017 01:21 PM
What was the base release from where you upgraded the iWAN App? From release 1.3.2 we implemented app separation whereby the apic-em platform releases and iWAN App releases have been de-coupled. In other words, if you upgraded ONLY the iWAN App, keeping the apic-em platform release the same, it's most likely not to work that way. To use iWAN App 1.4, you first will have to upgrade apic-em release to 1.4 and then on top of it, upgrade the iWAN App to release 1.4.
02-27-2017 02:17 PM
Hi Chakrapani,
This has nothing to do with the iWAN app - I was upgrading the APIC-EM base software, not iWAN. Not sure if you read my full post or not. But I do understand the de-coupling of the iWAN app from the main software. Thanks for replying.
Regards,
Brian
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community:
This community is intended for developer topics around Data Center technology and products. If you are looking for a non-developer topic about Data Center, you might find additional information in the Data Center and Cloud community