DBMirror for SQL Server vs. Always On: Which Is Right for You?

Troubleshooting Common DBMirror for SQL Server Failover IssuesHigh availability with DBMirror for SQL Server can reduce downtime, but failovers sometimes behave unexpectedly. This article walks through the most common failover problems, how to diagnose them, and practical fixes to restore reliable automatic or manual failover.

1. Understanding DBMirror failover modes and prerequisites

DBMirror typically provides synchronous and asynchronous modes and supports automatic failover only when using synchronous replication with a witness server (for database mirroring technical parity). Common prerequisites:

Synchronous (High Safety) mode for automatic failover.
A properly configured witness server for automatic failover.
Matching database names and compatible recovery states on principal and mirror.
Network connectivity and firewall rules allowing SQL Server endpoints (TCP ports).
SQL Server service accounts with necessary permissions.

If these prerequisites aren’t met, failover may be impossible or unreliable.

2. Symptom: Automatic failover did not occur

Common causes

Witness is not connected or unreachable.
Mirroring operating in asynchronous (High Performance) mode.
Database state is not synchronized (SYNCHRONIZED required).
Network partitioning causing the witness to lose quorum.

How to diagnose

Check the mirroring state in SQL Server Management Studio (SSMS) or run:


SELECT  DB_NAME(database_id) AS DatabaseName, mirroring_state_desc, mirroring_role_desc, mirroring_partner_name, mirroring_witness_name, mirroring_role_sequence FROM sys.database_mirroring;

Query SQL Server error logs and Windows Event Viewer for connectivity/timeouts.
From each instance, confirm TCP connectivity to partner and witness (telnet/netcat or Test-NetConnection).

Fixes

Ensure mirroring is set to HIGH SAFETY (synchronous) for automatic failover.
Verify the witness instance is online and reachable; fix firewall rules or DNS.
Resolve network partitions so witness can form quorum with principal/mirror.
Bring the mirror to a SYNCHRONIZED state: check log send/receive, apply any blocking transactions, and resume mirroring if suspended.

3. Symptom: Manual failover hangs or takes very long

Common causes

Long-running transactions preventing the mirror from joining synchronized state.
Large transaction log that takes a long time to send/apply.
Insufficient network bandwidth or latency spikes.
Mirroring suspended or database in a restoring/recovery state.

How to diagnose

Inspect sys.dm_tran_database_transactions and sys.dm_exec_requests for active transactions.

Monitor log_send_queue_size and redo_queue_size from sys.database_mirroring:


SELECT  DB_NAME(database_id) AS DatabaseName, log_send_queue_size, redo_queue_size, mirroring_state_desc FROM sys.database_mirroring;

Check wait stats and network metrics.

Fixes

If long transactions exist, either wait for completion or, where acceptable, kill blocking sessions carefully.
Increase network throughput or schedule failovers during low-activity windows.
Shrink or checkpoint transaction logs as appropriate before planned failovers (ensure safety and backups).

Resume mirroring if suspended:


ALTER DATABASE YourDB SET PARTNER RESUME;

4. Symptom: Database state shows SUSPENDED or DISCONNECTED

Common causes

Network interruptions between principal, mirror, or witness.
Mirroring endpoint stopped or misconfigured.
Log send/receive errors due to disk space or permission issues.
Version or patch mismatches causing communication problems.

How to diagnose

Review SQL Server error log for mirroring endpoint errors, authentication failures, or IO problems.

Check endpoint state:


SELECT name, role_desc, state_desc FROM sys.database_mirroring_endpoints;

Verify disk space and I/O errors on each server.

Fixes

Restart the mirroring endpoint if necessary:


ALTER ENDPOINT [Mirroring] STATE = STARTED;

Re-establish partner connections:


ALTER DATABASE YourDB SET PARTNER = 'TCP://mirror_server:5022';

Free disk space and ensure SQL Server service account has proper permissions.
Apply matching SQL Server cumulative updates on both instances to avoid protocol mismatches.

5. Symptom: Failover succeeds but applications fail to reconnect

Common causes

Applications use server name in connection string pointing to former principal.
DNS cache tells clients to the old IP, or load balancer not updated.
Sessions relying on instance-level resources (linked servers, tempdb state) that change after failover.
Logins and permissions missing on the new principal.

How to diagnose

Test connections using the mirror endpoint or failover partner syntax:
- connection string example: Server=PrimaryServer; Failover Partner=MirrorServer;
Check application error messages and client-side DNS caching.

Verify SQL logins and permissions exist on both instances:


SELECT name, type_desc, sid FROM sys.server_principals WHERE type IN ('S','U','G');

Fixes

Use the Failover Partner connection string or DNS CNAME that points to the current principal.
Ensure SQL logins are synchronized between principal and mirror (use sp_help_rev_login to script logins with SIDs/passwords).
Inform application teams about reconnection behavior and tune client connection retry logic.
For seamless reconnection, use listeners or virtual IPs where supported.

6. Symptom: Split-brain or both servers think they are principal

Common causes

Witness lost and network partition leaves both principal and mirror with enough local state to assume primary.
Manual misconfiguration or forced service start without proper checks.

How to diagnose

Check each node’s sys.database_mirroring.mirroring_role_desc and witness connectivity.
Review error logs for messages about forced service or safety violations.

Fixes

Restore network connectivity to the witness to regain quorum; manual intervention may be needed to reconcile divergent data.
If split-brain occurs, determine which node has the authoritative data (usually whichever was primary before partition) and perform a planned role change or rebuild the affected copy.
Consider implementing stricter monitoring and alerts to detect witness or network failures early.

7. Tools and scripts for ongoing monitoring

Key monitoring metrics

mirroring_state_desc, mirroring_role_desc
log_send_queue_size, redo_queue_size
mirroring_witness_name and connectivity status
Endpoint state and error log warnings

Sample scheduled check (SQL):

SELECT      DB_NAME(database_id) AS DatabaseName,     mirroring_state_desc,     mirroring_role_desc,     log_send_queue_size,     redo_queue_size,     mirroring_witness_name FROM sys.database_mirroring;

Automate alerts when log_send_queue_size or redo_queue_size exceeds thresholds, or when state becomes SUSPENDED/DISCONNECTED.

8. Preventive best practices

Use HIGH SAFETY with a witness for automatic failover when zero-datap-loss is required.
Keep SQL Server versions and patch levels consistent across principal, mirror, and witness.
Ensure robust network design with redundant paths and low latency.
Synchronize logins, jobs, and maintenance scripts between instances.
Test failovers regularly during maintenance windows and document runbooks.
Monitor disk space, transaction log growth, and long-running transactions proactively.

9. When to consider alternatives

If DBMirror failover issues continue and your environment requires more flexible automatic failover, consider:

SQL Server Always On Availability Groups (for readable secondary and DB-level failover).
Failover Clustering (for instance-level HA). Evaluate based on RPO/RTO requirements, licensing, and application compatibility.

Conclusion

Most DBMirror failover problems stem from networking, witness availability, transaction log backlogs, or configuration mismatches. Systematic checks of mirroring state, endpoints, logs, and network connectivity, combined with preventive monitoring and consistent patching, will resolve the majority of issues and improve failover reliability.

DBMirror for SQL Server vs. Always On: Which Is Right for You?

1. Understanding DBMirror failover modes and prerequisites

2. Symptom: Automatic failover did not occur

3. Symptom: Manual failover hangs or takes very long

4. Symptom: Database state shows SUSPENDED or DISCONNECTED

5. Symptom: Failover succeeds but applications fail to reconnect

6. Symptom: Split-brain or both servers think they are principal

7. Tools and scripts for ongoing monitoring

8. Preventive best practices

9. When to consider alternatives

Conclusion

Comments

Leave a Reply Cancel reply

More posts

Fakturama

Secure Folder vs. Traditional Storage: Which is Right for You?

TinyClock: Merging Functionality and Aesthetics in a Small Package

From Clubs to Screens: The Evolution of DJing in the Streaming Era