title | description | services | ms.service | ms.subservice | ms.custom | ms.devlang | ms.topic | author | ms.author | ms.reviewer | manager | ms.date |
---|---|---|---|---|---|---|---|---|---|---|---|---|
SQL Database disaster recovery | Microsoft Docs |
Learn how to recover a database from a regional datacenter outage or failure with the Azure SQL Database active geo-replication, and geo-restore capabilities. |
sql-database |
sql-database |
operations |
conceptual |
anosov1960 |
sashan |
carlrab |
craigg |
07/16/2018 |
Azure SQL Database offers the following capabilities for recovering from an outage:
To learn about business continuity scenarios and the features supporting these scenarios, see Business continuity.
Note
If you are using zone-redundant Premium or Business Critical databases or pools, the recovery process is automated and the rest of this material does not apply.
For success with recovery to another data region using either failover groups or geo-redundant backups, you need to prepare a server in another data center outage to become the new primary server should the need arise as well as have well-defined steps documented and tested to ensure a smooth recovery. These preparation steps include:
- Identify the logical server in another region to become the new primary server. For geo-restore, this is generally a server in the paired region for the region in which your database is located. This eliminates the additional traffic cost during the geo-restoring operations.
- Identify, and optionally define, the server-level firewall rules needed on for users to access the new primary database.
- Determine how you are going to redirect users to the new primary server, such as by changing connection strings or by changing DNS entries.
- Identify, and optionally create, the logins that must be present in the master database on the new primary server, and ensure these logins have appropriate permissions in the master database, if any. For more information, see SQL Database security after disaster recovery
- Identify alert rules that need to be updated to map to the new primary database.
- Document the auditing configuration on the current primary database
- Perform a disaster recovery drill. To simulate an outage for geo-restore, you can delete or rename the source database to cause application connectivity failure. To simulate an outage using failover groups, you can disable the web application or virtual machine connected to the database or failover the database to cause application connectivity failures.
The recovery operation impacts the application. It requires changing the SQL connection string or redirection using DNS and could result in permanent data loss. Therefore, it should be done only when the outage is likely to last longer than your application's recovery time objective. When the application is deployed to production you should perform regular monitoring of the application health and use the following data points to assert that the recovery is warranted:
- Permanent connectivity failure from the application tier to the database.
- The Azure portal shows an alert about an incident in the region with broad impact.
Note
If you are using failover groups and chose automatic failover, the recovery process is automated and transparent to the application.
Depending on your application tolerance to downtime and possible business liability you can consider the following recovery options.
Use the Get Recoverable Database (LastAvailableBackupDate) to get the latest Geo-replicated restore point.
The Azure teams work diligently to restore service availability as quickly as possible but depending on the root cause it can take hours or days. If your application can tolerate significant downtime you can simply wait for the recovery to complete. In this case, no action on your part is required. You can see the current service status on our Azure Service Health Dashboard. After the recovery of the region, your application’s availability is restored.
If your application’s downtime can result in business liability, you should be using failover groups. It enables the application to quickly restore availability in a different region in case of an outage. Learn how to configure failover groups.
To restore availability of the database(s) you need to initiate the failover to the secondary server using one of the supported methods.
Use one of the following guides to fail over to a geo-replicated secondary database:
- Fail over to a geo-replicated secondary server using the Azure portal
- Fail over to the secondary server using PowerShell
If your application’s downtime does not result in business liability you can use geo-restore as a method to recover your application database(s). It creates a copy of the database from its latest geo-redundant backup.
If you are using geo-restore to recover from an outage, you must make sure that the connectivity to the new databases is properly configured so that the normal application function can be resumed. This is a checklist of tasks to get your recovered database production ready.
Because your recovered database resides in a different server, you need to update your application’s connection string to point to that server.
For more information about changing connection strings, see the appropriate development language for your connection library.
You need to make sure that the firewall rules configured on server and on the database match those that were configured on the primary server and primary database. For more information, see How to: Configure Firewall Settings (Azure SQL Database).
You need to make sure that all the logins used by your application exist on the server which is hosting your recovered database. For more information, see Security Configuration for geo-replication.
Note
You should configure and test your server firewall rules and logins (and their permissions) during a disaster recovery drill. These server-level objects and their configuration may not be available during the outage.
You need to make sure your existing alert rule settings are updated to map to the recovered database and the different server.
For more information about database alert rules, see Receive Alert Notifications and Track Service Health.
If auditing is required to access your database, you need to enable Auditing after the database recovery. For more information, see Database auditing.
- To learn about Azure SQL Database automated backups, see SQL Database automated backups
- To learn about business continuity design and recovery scenarios, see Continuity scenarios
- To learn about using automated backups for recovery, see restore a database from the service-initiated backups