DBMirror for SQL Server vs. Always On: Which Is Right for You?

Troubleshooting Common DBMirror for SQL Server Failover IssuesHigh availability with DBMirror for SQL Server can reduce downtime, but failovers sometimes behave unexpectedly. This article walks through the most common failover problems, how to diagnose them, and practical fixes to restore reliable automatic or manual failover.


1. Understanding DBMirror failover modes and prerequisites

DBMirror typically provides synchronous and asynchronous modes and supports automatic failover only when using synchronous replication with a witness server (for database mirroring technical parity). Common prerequisites:

  • Synchronous (High Safety) mode for automatic failover.
  • A properly configured witness server for automatic failover.
  • Matching database names and compatible recovery states on principal and mirror.
  • Network connectivity and firewall rules allowing SQL Server endpoints (TCP ports).
  • SQL Server service accounts with necessary permissions.

If these prerequisites aren’t met, failover may be impossible or unreliable.


2. Symptom: Automatic failover did not occur

Common causes

  • Witness is not connected or unreachable.
  • Mirroring operating in asynchronous (High Performance) mode.
  • Database state is not synchronized (SYNCHRONIZED required).
  • Network partitioning causing the witness to lose quorum.

How to diagnose

  • Check the mirroring state in SQL Server Management Studio (SSMS) or run:
    
    SELECT  DB_NAME(database_id) AS DatabaseName, mirroring_state_desc, mirroring_role_desc, mirroring_partner_name, mirroring_witness_name, mirroring_role_sequence FROM sys.database_mirroring; 
  • Query SQL Server error logs and Windows Event Viewer for connectivity/timeouts.
  • From each instance, confirm TCP connectivity to partner and witness (telnet/netcat or Test-NetConnection).

Fixes

  • Ensure mirroring is set to HIGH SAFETY (synchronous) for automatic failover.
  • Verify the witness instance is online and reachable; fix firewall rules or DNS.
  • Resolve network partitions so witness can form quorum with principal/mirror.
  • Bring the mirror to a SYNCHRONIZED state: check log send/receive, apply any blocking transactions, and resume mirroring if suspended.

3. Symptom: Manual failover hangs or takes very long

Common causes

  • Long-running transactions preventing the mirror from joining synchronized state.
  • Large transaction log that takes a long time to send/apply.
  • Insufficient network bandwidth or latency spikes.
  • Mirroring suspended or database in a restoring/recovery state.

How to diagnose

  • Inspect sys.dm_tran_database_transactions and sys.dm_exec_requests for active transactions.
  • Monitor log_send_queue_size and redo_queue_size from sys.database_mirroring:
    
    SELECT  DB_NAME(database_id) AS DatabaseName, log_send_queue_size, redo_queue_size, mirroring_state_desc FROM sys.database_mirroring; 
  • Check wait stats and network metrics.

Fixes

  • If long transactions exist, either wait for completion or, where acceptable, kill blocking sessions carefully.
  • Increase network throughput or schedule failovers during low-activity windows.
  • Shrink or checkpoint transaction logs as appropriate before planned failovers (ensure safety and backups).
  • Resume mirroring if suspended:
    
    ALTER DATABASE YourDB SET PARTNER RESUME; 

4. Symptom: Database state shows SUSPENDED or DISCONNECTED

Common causes

  • Network interruptions between principal, mirror, or witness.
  • Mirroring endpoint stopped or misconfigured.
  • Log send/receive errors due to disk space or permission issues.
  • Version or patch mismatches causing communication problems.

How to diagnose

  • Review SQL Server error log for mirroring endpoint errors, authentication failures, or IO problems.
  • Check endpoint state:
    
    SELECT name, role_desc, state_desc FROM sys.database_mirroring_endpoints; 
  • Verify disk space and I/O errors on each server.

Fixes

  • Restart the mirroring endpoint if necessary:
    
    ALTER ENDPOINT [Mirroring] STATE = STARTED; 
  • Re-establish partner connections:
    
    ALTER DATABASE YourDB SET PARTNER = 'TCP://mirror_server:5022'; 
  • Free disk space and ensure SQL Server service account has proper permissions.
  • Apply matching SQL Server cumulative updates on both instances to avoid protocol mismatches.

5. Symptom: Failover succeeds but applications fail to reconnect

Common causes

  • Applications use server name in connection string pointing to former principal.
  • DNS cache tells clients to the old IP, or load balancer not updated.
  • Sessions relying on instance-level resources (linked servers, tempdb state) that change after failover.
  • Logins and permissions missing on the new principal.

How to diagnose

  • Test connections using the mirror endpoint or failover partner syntax:
    • connection string example: Server=PrimaryServer; Failover Partner=MirrorServer;
  • Check application error messages and client-side DNS caching.
  • Verify SQL logins and permissions exist on both instances:
    
    SELECT name, type_desc, sid FROM sys.server_principals WHERE type IN ('S','U','G'); 

Fixes

  • Use the Failover Partner connection string or DNS CNAME that points to the current principal.
  • Ensure SQL logins are synchronized between principal and mirror (use sp_help_rev_login to script logins with SIDs/passwords).
  • Inform application teams about reconnection behavior and tune client connection retry logic.
  • For seamless reconnection, use listeners or virtual IPs where supported.

6. Symptom: Split-brain or both servers think they are principal

Common causes

  • Witness lost and network partition leaves both principal and mirror with enough local state to assume primary.
  • Manual misconfiguration or forced service start without proper checks.

How to diagnose

  • Check each node’s sys.database_mirroring.mirroring_role_desc and witness connectivity.
  • Review error logs for messages about forced service or safety violations.

Fixes

  • Restore network connectivity to the witness to regain quorum; manual intervention may be needed to reconcile divergent data.
  • If split-brain occurs, determine which node has the authoritative data (usually whichever was primary before partition) and perform a planned role change or rebuild the affected copy.
  • Consider implementing stricter monitoring and alerts to detect witness or network failures early.

7. Tools and scripts for ongoing monitoring

Key monitoring metrics

  • mirroring_state_desc, mirroring_role_desc
  • log_send_queue_size, redo_queue_size
  • mirroring_witness_name and connectivity status
  • Endpoint state and error log warnings

Sample scheduled check (SQL):

SELECT      DB_NAME(database_id) AS DatabaseName,     mirroring_state_desc,     mirroring_role_desc,     log_send_queue_size,     redo_queue_size,     mirroring_witness_name FROM sys.database_mirroring; 

Automate alerts when log_send_queue_size or redo_queue_size exceeds thresholds, or when state becomes SUSPENDED/DISCONNECTED.


8. Preventive best practices

  • Use HIGH SAFETY with a witness for automatic failover when zero-datap-loss is required.
  • Keep SQL Server versions and patch levels consistent across principal, mirror, and witness.
  • Ensure robust network design with redundant paths and low latency.
  • Synchronize logins, jobs, and maintenance scripts between instances.
  • Test failovers regularly during maintenance windows and document runbooks.
  • Monitor disk space, transaction log growth, and long-running transactions proactively.

9. When to consider alternatives

If DBMirror failover issues continue and your environment requires more flexible automatic failover, consider:

  • SQL Server Always On Availability Groups (for readable secondary and DB-level failover).
  • Failover Clustering (for instance-level HA). Evaluate based on RPO/RTO requirements, licensing, and application compatibility.

Conclusion

Most DBMirror failover problems stem from networking, witness availability, transaction log backlogs, or configuration mismatches. Systematic checks of mirroring state, endpoints, logs, and network connectivity, combined with preventive monitoring and consistent patching, will resolve the majority of issues and improve failover reliability.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *