Thursday, September 23, 2010


A VMware HA Cluster consists of nodes, primary and secondary nodes. Primary nodes hold cluster settings and all "node states" which are synchronized between primaries. Node states hold for instance resource usage information. In case that vCenter is not available the primary nodes will have a rough estimate of the resource occupation and can take this into account when a fail-over needs to occur. Secondary nodes send their state info to the primary nodes.
Nodes send a heartbeat to each other, which is the mechanism to detect possible outages. Primary nodes send heartbeats to primary nodes and secondary nodes. Secondary nodes send their heartbeats to primary nodes only. Nodes send out these heartbeats every second by default. However this is a changeable value: das.failuredetectioninterval. (Advanced Settings on your HA-Cluster)
The first 5 hosts that join the VMware HA cluster are automatically selected as primary nodes. All the others are automatically selected as secondary nodes. When you do a reconfigure for HA the primary nodes and secondary nodes are selected again, this is at random. The vCenter client does not show which host is a primary and which is not. This however can be revealed from the Service Console:
cat /var/log/vmware/aam/aam_config_util_listnodes.logAnother method of showing the primary nodes is:
/opt/vmware/aam/bin/Cli (ftcli on earlier versions)
AAM> ln
The Limit of 5 is a soft limit, so you can manually add a 6th, but this is not supported.
To promote a node:
/opt/vmware/aam/bin/Cli (ftcli on earlier versions)
AAM> promotenode To demote a node:
/opt/vmware/aam/bin/Cli (ftcli on earlier versions)
AAM> demotenode
The promotion of a secondary host only occurs when a primary host is either put in "Maintenance Mode", disconnected from the cluster, removed from the cluster or when you do a reconfigure for HA.If all primary hosts fail simultaneously no HA initiated restart of the VMs will take place. HA needs at least one primary host to restart VMs. This is why you can only take four host failures in account when configuring HA.
You will need at least one primary because the "fail-over coordinator" role will be assigned to this primary, this role is also described as "active primary". The fail-over coordinator coordinates the restart of VMs on the remaining primary and secondary hosts. The coordinator takes restart priorities in account. Keep in mind, when two hosts fail at the same time it will handle the restart sequentially. In other words, restart the VMs of the first failed host (taking restart priorities in account) and then restart the VMs of the host that failed as second (again taking restart priorities in account). If the fail-over coordinator fails one of the other primaries will take over.


das.isolationaddress[x] – IP address the ESX hosts uses to check on isolation when no heartbeats are received, where [x] = 1‐10. VMware HA will use the default gateway as an isolation address and the provided value as an additional checkpoint. It is recommended to add an isolation address when a secondary service console is being used for redundancy purposes.


Power off – When a network isolation occurs all VMs are powered off. It is a hard stop.
Shut down – When a network isolation occurs all VMs running on that host are shut down via VMware Tools. If this is not successful within 5 minutes a "power off" will be executed.
Leave powered on – When a network isolation occurs on the host the state of the VMs remains unchanged.


http://www.yellow-bricks.com/vmware-high-availability-deepdiv/#HA-primariesandsecondaries

No comments: