MSSQL Failover Cluster Build Failure : An error occurred while creating the cluster and the nodes will be cleaned up.
Recently I was working on building an MSSQL Failover cluster and encountered an issue.
Just to give a little bit of background, I have built more than 50 clusters till now in my 16+ years of Windows System Administration profile and most of them have been built on a pair of Physical Servers having Shared storage coming from SAN and with Node and Disk Majority setup so I know at least the basics of how a cluster operates so I apologize in advance on any additional troubleshooting you might think that I should have performed.
The current setup was:
- Two Dell PowerEdge 750 Servers with Raid 1 for OS (BOSS Disks)
- Heartbeat cabling done and both nodes are communicating on private IP.
- Have pre-staged a CNO and kept it in disabled state.
- Storage presented from SAN on Dual Port HBA Adapters including 1 ldev of 1GB for Qorum.
I started the cluster build process and added both eligible nodes for validation.
The Validation passed without any errors, but as soon as I moved further to create the cluster, It waited for couple of seconds and eventually failed with following error.
Error : An error occurred while creating the cluster and the nodes will be cleaned up. Please wait...
Following which I was presented with below error:
There was an error cleaning up the cluster nodes. Use Clear-ClusterNode to manually clean up the nodes.
An error occurred while creating
the cluster.
An error occurred creating cluster 'MSSQLCLXXXXXXXX'.
The specified server cannot perform the requested operation
I tried using PowerShell cmdlets to create the cluster but as expected that also failed with same error:
Unfortunately I was not able to capture the error message and screen shots but here is what I have from the PowerShell Transcript which by default enabled on the servers and captures every command execution.
I was not able to figure out what could go wrong and why is the cluster not getting formed. Performed following prechecks again to make sure everything is in order.
- Checked if a DC is reachable from both nodes. ==> PASS
- Checked communication on Private and Public NIC between both nodes ==> PASS
- Checked necessary DNS Resolution is working from both nodes. ==> PASS
- Checked if SAN Storage can be brought online and offline on both nodes ==> PASS
- Full control security permissions for both Computer Objects on the CNO in AD ==> PASS
Now even after everything set and working properly I started thinking of taking a network capture from the node while attempting cluster creation and then have it analyzed by some of my networking friends to see if there is something expected on the network layer but is not working properly.
This step had a dependency on someone with networking knowledge (I dont have) Hence at the back of my mind I was still thinking, What could be wrong !!!
I then went back to basics and launched a Command Prompt as Adninistrator
Typed in a command netstat -ano and started the cluster creation wizard again.
I tried to filter the output of netstat -ano excluded everything other than ESTABLISHED & LISTENING.
I found a connection attempt to one IP Address on TCP port 3268 which was in SYN_SENT state
I know that 3268 is the Port number for Global Catalog and the IP address belonged to one of the DCs I have in my environment.
This was the moment I figured out that Cluster formation is dependent on Global Catalog and unless it is available/reachable on TCP port 3268 during Cluster formation, The cluster will not be formed and the process will fail.
I worked with Active Directory folks and got to know that Global Catalog ports has been restricted for the whole environment and the connectivity is whitelisted only upon request and valid justification. (Did not try to dig into the reason)
After Global Catalog connectivity was allowed, I attempted to create the cluster again and it worked without any issues.
So the summary is:
Connectivity to Global Catalog on TCP port 3268 is also necessary while trying to build a MSCS Failover Cluster
Happy troubleshooting !!!
No comments:
Post a Comment