Failover-Cluster (SQL): What to do when DNS registration does not work after update

ralfbrooks78
Aug 21, 2023
4 min read

The Failover Cluster service does do an independent registration of the "A record" for the cluster alias name after the IP address has been acquired with the DHCP client. By default the PTR record is not registered, so the PublishPTRRecords property has to be set so that it is registered going forward. Please Note: Even if the Cluster service defaulted to registering the PTR record, it may not be able to override the record registered by DHCP if aging and scavenging are enabled on the DNS server, or only secure updates are used.

Failover-Cluster (SQL): No DNS registration after update

Download Zip

Cluster node 01 has thus retained its roles instead of returning them to cluster node 02 via script after the server has been restarted. In addition, it was not possible to install the update for [Cluster node] 02. So this means: [Cluster node] 01 with February [2019] update and [Cluster node] 02 with January [2019] update. Active was the [Cluster node] 01 with February update.

It's obvious that the updates are related, isn't it? Unfortunately, I didn't realize this until after I started to follow up the February update that wasn't successfully installed. Let's see what happens after the reboot.

So our other Cluster was due for the problem as well, so I waited until it started throwing out event ID 1257 again, then just deleted the DNS record for that clusters main CNO. Similar to before, once it tried again (it tries every 15 mins once it fails) it registered in DNS, but now the "DHCP" role owns it. It's almost like I'm having the exact opposite issue as you, where my main CNO DNS object is now being 'owned' by the roles it maintains, rather than by itself. So far, no errors have ever happened when attempting to register the roles DNS, but it might be that I'm not waiting long enough after the main CNO quits being able to update itself.

Hi Joe, Sorry for the delay in response. There are currently two different issues being tracked by Microsoft which result in the event ID 1257. In Server 2016, we are seeing the 1257 error occurring after password resets. There was an initial fix for this released in summer 2021, and an update to it released in November 2021, which was enabled in the March 2022 release. One of the updates added the ability to use the Virtual Computer Object (VCO), i.e. the server name used for individual resources, as well as the Cluster Name Object (CNO), as an authenticating resource in order to update DNS entries. The behavior we would see here, was if the CNO was taken offline/online, either manually or by failover, the issue would cease, until the next CNO password change. If this behavior was noted, then workarounds such as deleting and recreating the DNS object, with the checkbox to allow any authenticated account to update it, would not fix it long term.

The entries on the DNS server would look like the following after the registration. To view the DNS entries on your local DNS server, click on Start -> Administrative Tools -> DNS. This will open the DNS Manager GUI.

In Windows Server 2008, DNS registration behaved slightly differently. If the Network Name had been registered within the last 24 hours, then it would reregister with DNS 10 minutes after coming online. If it had not been registered within the past day, then it is immediately registered when the resource comes Online.

The issue, it turns out after days of forehead slapping, is that there were some additional external DNS servers specified on the NIC for that particular node - a remnant of its pre-cluster days. Your cluster node network interfaces should only have DNS records for your internal DNS server. While this didn't cause issues in most cases, for the purpose of cluster network name registration it was wreaking havoc.

In this situation if an application does attempt an update or delete during replication, then the request will merely return an error. Deletes and writes will become possible again after replication is complete.

The Certificate Revocation List is likely to change over time. Work with your security department to set up a mechanism to update the revocation list on to all nodes in the cluster in a timely manner. A reload of every node in the cluster is required after the revocation list has been updated.

The time for which a region will block updates after reaching the StoreFile limit defined by hbase.hstore.blockingStoreFiles. After this time has elapsed, the region will stop blocking updates even if a compaction has not been completed.

Although splitting the region is a local decision made by the RegionServer, the split process itself must coordinate with many actors. The RegionServer notifies the Master before and after the split, updates the .META. table so that clients can discover the new daughter regions, and rearranges the directory structure and data files in HDFS. Splitting is a multi-task process. To enable rollback in case of an error, the RegionServer keeps an in-memory journal about the execution state. The steps taken by the RegionServer to execute the split are illustrated in RegionServer Split Process. Each step is labeled with its step number. Actions from RegionServers or Master are shown in red, while actions from the clients are shown in green. 2ff7e9595c

FR

ROGER FORBES

LUXURY PARTY RENTALS

Failover-Cluster (SQL): What to do when DNS registration does not work after update

Failover-Cluster (SQL): No DNS registration after update

Recent Posts

Comments