Skip to content

Latest commit

 

History

History
176 lines (143 loc) · 26.7 KB

backup-azure-vms-troubleshoot.md

File metadata and controls

176 lines (143 loc) · 26.7 KB
title description services documentationcenter author manager editor ms.assetid ms.service ms.workload ms.tgt_pltfrm ms.devlang ms.topic ms.date ms.author
Troubleshoot backup errors with Azure virtual machine | Microsoft Docs
Troubleshoot backup and restore of Azure virtual machines
backup
trinadhk
shreeshd
73214212-57a4-4b57-a2e2-eaf9d7fde67f
backup
storage-backup-recovery
na
na
article
08/17/2017
trinadhk;markgal;jpallavi;

Troubleshoot Azure virtual machine backup

[!div class="op_single_selector"]

You can troubleshoot errors encountered while using Azure Backup with information listed in the table below.

Backup

Error: The specified Disk Configuration is not supported

Note

We have a private preview to support backups for VMs with >1TB unmanaged disks. For details refer to Private preview for large disk VM backup support

Currently Azure Backup doesn’t support disk sizes greater than 1023GB.

  • If you have disks greater than 1 TB , attach new disks which are less than 1 TB
  • Then, copy the data from disk greater than 1TB into newly created disk(s) of size less than 1TB.
  • Ensure that all data has been copied and remove the disks greater than 1TB
  • Initiate the backup.
Error details Workaround
Could not perform the operation as VM no longer exists. - Stop protecting virtual machine without deleting backup data. More details at http://go.microsoft.com/fwlink/?LinkId=808124 This happens when the primary VM is deleted, but the backup policy continues looking for a VM to back up. To fix this error:
  1. Recreate the virtual machine with the same name and same resource group name [cloud service name],
    (OR)
  2. Stop protecting virtual machine with or without deleting the backup data. More details
Snapshot operation failed due to no network connectivity on the virtual machine - Ensure that VM has network access. For snapshot to succeed, either whitelist Azure datacenter IP ranges or set up a proxy server for network access. For more details, refer to http://go.microsoft.com/fwlink/?LinkId=800034. If you are already using proxy server, make sure that proxy server settings are configured correctly This error is thrown when you deny the outbound internet connectivity on the virtual machine. Internet connectivity is required for VM snapshot extension to take a snapshot of underlying disks of the virtual machine. Learn more on how to fix snapshot failures due to blocked network access.
VM agent is unable to communicate with the Azure Backup Service. - Ensure the VM has network connectivity and the VM agent is latest and running. For more information, please refer to http://go.microsoft.com/fwlink/?LinkId=800034 This error is thrown if there is a problem with the VM Agent or network access to the Azure infrastructure is blocked in some way. Learn more about debugging up VM snapshot issues.
If the VM agent is not causing any issues, then restart the VM. At times an incorrect VM state can cause issues, and restarting the VM resets this "bad state".
VM is in Failed Provisioning State - Please restart the VM and make sure that the VM is in Running or Shut-down state for backup This occurs when one of the extension failures leads VM state to be in failed provisioning state. Go to extensions list and see if there is a failed extension, remove it and try restarting the virtual machine. If all extensions are in running state, check if VM agent service is running. If not, restart the VM agent service.
VMSnapshot extension operation failed for managed disks - Please retry the backup operation. If the issue repeats, follow the instructions at 'http://go.microsoft.com/fwlink/?LinkId=800034'. If it fails further, please contact Microsoft support This error when Azure Backup service fails to trigger a snapshot. Learn more about debugging VM snapshot issues.
Could not copy the snapshot of the virtual machine, due to insufficient free space in the storage account - Ensure that storage account has free space equivalent to the data present on the premium storage disks attached to the virtual machine In case of premium VMs, we copy the snapshot to storage account. This is to make sure that backup management traffic, which works on snapshot, doesn't limit the number of IOPS available to the application using premium disks. Microsoft recommends you allocate only 50% of the total storage account space so the Azure Backup service can copy the snapshot to storage account and transfer data from this copied location in storage account to the vault.
Unable to perform the operation as the VM agent is not responsive This error is thrown if there is a problem with the VM Agent or network access to the Azure infrastructure is blocked in some way. For Windows VMs, check the VM agent service status in services and whether the agent appears in programs in control panel. Try removing the program from control panel and re-installing the agent as mentioned below. After re-installing the agent, trigger an adhoc backup to verify.
Recovery services extension operation failed. - Please make sure that latest virtual machine agent is present on the virtual machine and agent service is running. Please retry backup operation and if it fails, contact Microsoft support. This error is thrown when VM agent is out of date. Refer “Updating the VM Agent” section below to update the VM agent.
Virtual machine doesn't exist. - Please make sure that virtual machine exists or select a different virtual machine. This happens when the primary VM is deleted but the backup policy continues to look for a VM to perform backup. To fix this error:
  1. Recreate the virtual machine with the same name and same resource group name [cloud service name],
    (OR)
  2. Stop protecting the virtual machine without deleting the backup data. More details
Command execution failed. - Another operation is currently in progress on this item. Please wait until the previous operation is completed, and then retry An existing backup on the VM is running, and a new job cannot be started while the existing job is running.
Copying VHDs from the backup vault timed out - Please retry the operation in a few minutes. If the problem persists, contact Microsoft Support. This happens if there is a transient error on storage side or if backup service is not getting sufficient IOPS from storage account hosting the VM in order to transfer data within timeout period to vault. Make sure that you followed Best practices while setting up backup. Try moving VM to a different storage account which is not loaded and retry backup.
Backup failed with an internal error - Please retry the operation in a few minutes. If the problem persists, contact Microsoft Support You can get this error for 2 reasons:
  1. There is a transient issue in accessing the VM storage. Please check Azure Status to see if there is any on-going issue related to compute, storage, or networking in the region. Then retry the backup job once the issue is resolved.
  2. The original VM has been deleted and therefore, the recovery point cannot be taken. To keep the backup data for a deleted VM, but remove the backup errors: Unprotect the VM and choose the option to keep the data. This action stops the scheduled backup job and the recurring error messages.
Failed to install the Azure Recovery Services extension on the selected item - The VM agent is a prerequisite for the Azure Recovery Services Extension. Install the Azure VM agent and restart the registration operation
  1. Check if the VM agent has been installed correctly.
  2. Ensure the flag on the VM config is set correctly.
Read more about installing the VM agent, and how to validate the VM agent installation.
Extension installation failed with the error "COM+ was unable to talk to the Microsoft Distributed Transaction Coordinator This usually means that the COM+ service is not running. Contact Microsoft support for help on fixing this issue.
Snapshot operation failed with the VSS operation error "This drive is locked by BitLocker Drive Encryption. You must unlock this drive from the Control Panel. Turn off BitLocker for all drives on the VM and observe if the VSS issue is resolved
VM is not in a state that allows backups.
  • Check if VM is in a transient state between Running and Shut down. If it is, wait for the VM state to be one of them and trigger backup again.
  • If the VM is a Linux VM and uses [Security Enhanced Linux] kernel module, you need to exclude the Linux Agent path(/var/lib/waagent) from security policy to make sure backup extension gets installed.
Azure Virtual Machine Not Found. This happens when the primary VM is deleted but the backup policy continues to look for a VM to perform back up. To fix this error:
  1. Recreate the virtual machine with the same name and same resource group name [cloud service name],
    (OR)
  2. Disable protection for this VM so the backup jobs will not be created.
Virtual machine agent is not present on the virtual machine - Please install any prerequisite and the VM agent, and then restart the operation. Read more about VM agent installation, and how to validate the VM agent installation.
Snapshot operation failed due to VSS Writers in bad state You need to restart VSS(Volume Shadow copy Service) writers that are in bad state. To achieve this, from an elevated command prompt, run vssadmin list writers. Output contains all VSS writers and their state. For every VSS writer whose state is not "[1] Stable", restart VSS writer by running following commands from an elevated command prompt:
net stop serviceName
net start serviceName
Snapshot operation failed due to a parsing failure of the configuration This happens due to changed permissions on the MachineKeys directory: %systemdrive%\programdata\microsoft\crypto\rsa\machinekeys
Please run below command and verify that permissions on MachineKeys directory are default-ones:
icacls %systemdrive%\programdata\microsoft\crypto\rsa\machinekeys

Default permissions are:
Everyone:(R,W)
BUILTIN\Administrators:(F)

If you see permissions on MachineKeys directory different than default, please follow below steps to correct permissions, delete the certificate and trigger the backup.
  1. Fix permissions on MachineKeys directory.
    Using Explorer Security Properties and Advanced Security Settings on the directory, reset permissions back to the default values, remove any extra (than default) user object from the directory, and ensure that the ‘Everyone’ permissions had special access for:
    -List folder / read data
    -Read attributes
    -Read extended attributes
    -Create files / write data
    -Create folders / append data
    -Write attributes
    -Write extended attributes
    -Read permissions

  2. Delete all certificates with field ‘Issued To’ = "Windows Azure Service Management for Extensions" or "Windows Azure CRP Certificate Generator”.
    • Open Certificates(Local computer) console
    • Delete all certificates (under Personal -> Certificates) with field ‘Issued To’ = "Windows Azure Service Management for Extensions" or "Windows Azure CRP Certificate Generator”.
  3. Trigger VM backup.
Validation failed as virtual machine is encrypted with BEK alone. Backups can be enabled only for virtual machines encrypted with both BEK and KEK. Virtual machine should be encrypted using both BitLocker Encryption Key and Key Encryption Key. After that, backup should be enabled.
Azure Backup Service does not have sufficient permissions to Key Vault for Backup of Encrypted Virtual Machines. Backup service should be provided these permissions in PowerShell using steps mentioned in Enable Backup section of PowerShell documentation.
Installation of snapshot extension failed with error - COM+ was unable to talk to the Microsoft Distributed Transaction Coordinator Please try to start windows service "COM+ System Application" (from an elevated command prompt - net start COMSysApp).
If it fails while starting, please follow below steps:
  1. Validate that the Logon account of service "Distributed Transaction Coordinator" is "Network Service". If it is not, please change it to "Network Service", restart this service and then try to start service "COM+ System Application".'
  2. If it still fails to start, uninstall/install service "Distributed Transaction Coordinator" by following below steps:
    - Stop the MSDTC service
    - Open a command prompt (cmd)
    - Run command “msdtc -uninstall”
    - Run command “msdtc -install”
    - Start the MSDTC service
  3. Start windows service "COM+ System Application" and after it is started, trigger backup from portal.
Snapshot operation failed due to COM+ error The recommended action is to restart windows service "COM+ System Application" (from an elevated command prompt - net start COMSysApp). If the issue persists, restart the VM. If restarting the VM doesn't help, try removing the VMSnapshot Extension and trigger the backup manually.
Failed to freeze one or more mount-points of the VM to take a file-system consistent snapshot Use the following steps:
  1. Check the file-system state of all mounted devices using 'tune2fs' command.
    Eg: tune2fs -l /dev/sdb1 | grep "Filesystem state"
  2. Unmount the devices for which filesystem state is not clean using 'umount' command
  3. Run FileSystemConsistency Check on these devices using 'fsck' command
  4. Mount the devices again and try backup.
Snapshot operation failed due to failure in creating secure network communication channel
  1. Open Registry Editor by running regedit.exe in an elevated mode.
  2. Identify all versions of .NetFramework present in system. They are present under the hierarchy of registry key "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft"
  3. For each .NetFramework present in registry key, add following key:
    "SchUseStrongCrypto"=dword:00000001
Snapshot operation failed due to failure in installation of Visual C++ Redistributable for Visual Studio 2012 Navigate to C:\Packages\Plugins\Microsoft.Azure.RecoveryServices.VMSnapshot\agentVersion and install vcredist2012_x64. Make sure that registry key value for allowing this service installation is set to correct value i.e. value of registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Msiserver is set to 3 and not 4. If you are still facing issues with installation, restart installation service by running MSIEXEC /UNREGISTER followed by MSIEXEC /REGISTER from an elevated command prompt.

Jobs

Error details Workaround
Cancellation is not supported for this job type - Please wait until the job completes. None
The job is not in a cancelable state - Please wait until the job completes.
OR
The selected job is not in a cancelable state - Please wait for the job to complete.
In all likelihood, the job is almost completed. Please wait until the job is completed.
Cannot cancel the job because it is not in progress - Cancellation is only supported for jobs which are in progress. Please attempt cancel on an in progress job. This happens due to a transitory state. Wait for a minute and retry the cancel operation.
Failed to cancel the Job - Please wait till job finishes. None

Restore

Error details Workaround
Restore failed with Cloud Internal error
  1. Cloud service to which you are trying to restore is configured with DNS settings. You can check
    $deployment = Get-AzureDeployment -ServiceName "ServiceName" -Slot "Production" Get-AzureDns -DnsSettings $deployment.DnsSettings
    If there is Address configured, this means that DNS settings are configured.
  2. Cloud service to which to you are trying to restore is configured with ReservedIP and existing VMs in cloud service are in stopped state.
    You can check a cloud service has reserved IP by using following powershell cmdlets:
    $deployment = Get-AzureDeployment -ServiceName "servicename" -Slot "Production" $dep.ReservedIPName
  3. You are trying to restore a virtual machine with following special network configurations in to same cloud service.
    - Virtual machines under load balancer configuration (Internal and external)
    - Virtual machines with multiple Reserved IPs
    - Virtual machines with multiple NICs
    Please select a new cloud service in the UI or please refer to restore considerations for VMs with special network configurations.
The selected DNS name is already taken - Please specify a different DNS name and try again. The DNS name here refers to the cloud service name (usually ending with .cloudapp.net). This needs to be unique. If you encounter this error, you need to choose a different VM name during restore.

This error is shown only to users of the Azure portal. The restore operation through PowerShell will succeed because it only restores the disks and doesn't create the VM. The error will be faced when the VM is explicitly created by you after the disk restore operation.
The specified virtual network configuration is not correct - Please specify a different virtual network configuration and try again. None
The specified cloud service is using a reserved IP, which doesn't match with the configuration of the virtual machine being restored - Please specify a different cloud service, which is not using reserved IP, or choose another recovery point to restore from. None
Cloud service has reached limit on number of input end points - Retry the operation by specifying a different cloud service or by using an existing endpoint. None
Backup vault and target storage account are in two different regions - Ensure that the storage account specified in restore operation is in the same Azure region as the backup vault. None
Storage Account specified for the restore operation is not supported - Only Basic/Standard storage accounts with locally redundant or geo redundant replication settings are supported. Please select a supported storage account None
Type of Storage Account specified for restore operation is not online - Make sure that the storage account specified in restore operation is online This might happen because of a transient error in Azure Storage or due to an outage. Please choose another storage account.
Resource Group Quota has been reached - Please delete some resource groups from Azure portal or contact Azure support to increase the limits. None
Selected subnet does not exist - Please select a subnet which exists None
Backup Service does not have authorization to access resources in your subscription. To resolve this, first Restore Disks using steps mentioned in section Restore backed up disks in Choosing VM restore configuration. After that, use PowerShell steps mentioned in Create a VM from restored disks to create full VM from restored disks.

Backup or Restore taking time

If you see your backup(>12 hours) or restore taking time(>6 hours):

VM Agent

Setting up the VM Agent

Typically, the VM Agent is already present in VMs that are created from the Azure gallery. However, virtual machines that are migrated from on-premises datacenters would not have the VM Agent installed. For such VMs, the VM Agent needs to be installed explicitly.

For Windows VMs:

  • Download and install the agent MSI. You need Administrator privileges to complete the installation.
  • For Classic virtual machines, Update the VM property to indicate that the agent is installed. This step is not required for Resource Manager virtual machines.

For Linux VMs:

  • Install latest from distribution repository. We strongly recommend installing agent only through distribution repository. For details on package name, please refer to Linux agent repository
  • For classic VMs, Update the VM property to indicate that the agent is installed. This step is not required for Resource Manager virtual machines.

Updating the VM Agent

For Windows VMs:

  • Updating the VM Agent is as simple as reinstalling the VM Agent binaries. However, you need to ensure that no backup operation is running while the VM Agent is being updated.

For Linux VMs:

  • Follow the instructions on Updating Linux VM Agent. We strongly recommend updating agent only through distribution repository. We do not recommend downloading the agent code from directly github and updating it. If latest agent is not available for your distribution, please reach out to distribution support for instructions on how to install latest agent. You can check latest Windows Azure Linux agent information in github repository.

Validating VM Agent installation

How to check for the VM Agent version on Windows VMs:

  1. Log on to the Azure virtual machine and navigate to the folder C:\WindowsAzure\Packages. You should find the WaAppAgent.exe file present.
  2. Right-click the file, go to Properties, and then select the Details tab. The Product Version field should be 2.6.1198.718 or higher

Troubleshoot VM Snapshot Issues

VM backup relies on issuing snapshot commands to underlying storage. Not having access to storage, or delays in a snapshot task execution can cause the backup job to fail. The following can cause snapshot task failure.

  1. Network access to Storage is blocked using NSG
    Learn more on how to enable network access to Storage using either WhiteListing of IPs or through proxy server.

  2. VMs with Sql Server backup configured can cause snapshot task delay
    By default VM backup issues VSS Full backup on Windows VMs. On VMs that are running Sql Servers and if Sql Server backup is configured, this might cause delay in snapshot execution. Please set following registry key if you are experiencing backup failures because of snapshot issues.

    [HKEY_LOCAL_MACHINE\SOFTWARE\MICROSOFT\BCDRAGENT]
    "USEVSSCOPYBACKUP"="TRUE"
    
  3. VM status reported incorrectly because VM is shut down in RDP.
    If you have Shut down the virtual machine in RDP, please check back in the portal that VM status is reflected correctly. If not, please shut down the VM in portal using 'Shutdown' option in VM dashboard.

  4. If more than four VMs share the same cloud service, configure multiple backup policies to stage the backup times so no more than four VM backups are started at the same time. Try to spread the backup start times an hour apart between policies.

  5. VM is running at High CPU/Memory.
    If the virtual machine is running at High CPU usage(>90%) or memory, snapshot task is queued, delayed and will eventually gets timed-out. Try on-demand backup in such situations.


Networking

Like all extensions, Backup extension need access to the public internet to work. Not having access to the public internet can manifest itself in various ways:

  • The extension installation can fail
  • The backup operations (like disk snapshot) can fail
  • Displaying the status of the backup operation can fail

The need for resolving public internet addresses has been articulated here. You need to check the DNS configurations for the VNET and ensure that the Azure URIs can be resolved.

Once the name resolution is done correctly, access to the Azure IPs also needs to be provided. To unblock access to the Azure infrastructure, follow one of these steps:

  1. WhiteList the Azure datacenter IP ranges.
    • Get the list of Azure datacenter IPs to be whitelisted.
    • Unblock the IPs using the New-NetRoute cmdlet. Run this cmdlet within the Azure VM, in an elevated PowerShell window (run as Administrator).
    • Add rules to the NSG (if you have one in place) to allow access to the IPs.
  2. Create a path for HTTP traffic to flow
    • If you have some network restriction in place (a Network Security Group, for example) deploy an HTTP proxy server to route the traffic. Steps to deploy an HTTP Proxy server can found here.
    • Add rules to the NSG (if you have one in place) to allow access to the INTERNET from the HTTP Proxy.

Note

DHCP must be enabled inside the guest for IaaS VM Backup to work. If you need a static private IP, you should configure it through the platform. The DHCP option inside the VM should be left enabled. View more information about Setting a Static Internal Private IP.