cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
2190
Views
10
Helpful
6
Comments
ualbdp2c1
Level 4
Level 4

We have struggled with an issue for more than a year related to HA over WAN. Even a brief network outage (1-2 seconds) between data centers would cause the master role to switch from the publisher to the subscriber. Mastership changed any time the nodes went into island mode. We finally found the root cause after numerous TAC cases, and I think it's worth sharing since changing the hostnames is supported.

The logic to elect the master node when recovering from island mode is:

Step 1. Check the service status of the nodes. If node 1 is IN_SERVICE and node 2 is in PARTIAL_SERVICE, node 1 becomes the master. If the states are the same (IN_SERVICE or PARTIAL_SERVICE), go to step 2.

Step 2. The hardware specification of both nodes will be checked. The server with the better specification will be handed the mastership. If the hardware specifications are the same, go to step 3.

Step 3. The publisher (i.e. node 1) will become the master if the hostname matches the PrimaryEngineComputerName in the ClusterSpecificConfig. If there is no match, go to step 4.

Step 4. Make the subscriber master.

Alright, now let's back up about 18 months. We decided to build a new cluster to migrate from version 9.0 to 10.6. The intent was to simplify the transition from CAD to Finesse. We built a new cluster on 9.0, restored from backup, changed the hostnames and IPs, and then upgraded to 10.6.  

Back to the present (now on version 11.6), we finally lucked out with a TAC engineer that explained the island mode recovery process above. The engineer used the CET (Configuration Editor Tool) to check the ClusterSpecificConfig file where we discovered the primaryEngineComputerName was the hostname of our 9.0 publisher. So now we know step 3 above always fails and the subscriber is elected master as a result.

PrimaryEngineComputerName.png

 

TAC advised this value is populated at the time of install and cannot be changed (even with root access). The 9.0 cluster had been offline for over a year. Therefore, the best solution was to rename the 11.6 publisher to match the primaryEngineComputerName.

In conclusion, changing the hostname is supported, but it will cause issues with HA. Ideally, the set network hostname command should be updated to also change the primaryEngineComputerName. I asked to have a caveat added to the appropriate section of the admin guide. It's unclear if this request reached the technical writing team – I never received confirmation.

6 Comments
Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: