Backing up Virtual Machines to Cloud Storage
Last few years saw explosive growth in server virtualization market. Indeed, Server Virtualization has a lot of benefits, mainly including:
- Server consolidation;
- Isolation of applications;
- Faster server provisioning and portability of guest VMs;
- Reduce data center footage and power consumption;
- Save cost on testing servers
However, virtualization also creates a few new challenges. One of the new challenges is about how to backup virtual machines. Backing up a virtual machine may use a lot of storage and network resources. When a host machine hosts multiple virtual machines, backing up all virtual machines can become more challenging. If all virtual machines are being backed up at the same time, it may dramatically slow down the server performance.
DriveHQ’s Cloud Backup service offered as part of the Cloud IT solution has successfully backed data for tens of thousands of businesses. Our efficient and reliable backup software has also backed up many VMs for large enterprise customers. This article will talk more about the details of virtual machine backup methods, strategies, advantages and disadvantages, and why DriveHQ’s cloud backup solution is far better than other backup solutions.
1. Decide What to Backup
As with any backup implementation, you need to decide what data needs to be backed up. Traditionally, there are Image Backup and Data Backup.
1.1. Image Backup
It backs up the entire system image (or disk image). With this approach, the main advantage is if the operating system failed and cannot boot, the backup software can restore the entire system including user data. The disadvantages are Image Backup often takes much longer, requires much more storage space to keep the backup copies, and it is hard to restore individual files. Moreover, image backup usually can only be restored to the same computer with the same hardware configuration.
1.2. Data Backup (or file backup)
It usually does not backup the operating system and program files. Instead it backs up user data files. With this approach, if the operating system fails, the user can use a system recovery disk to restore the operating system, then reinstall other programs, then use the backup software to restore all user data. The advantages of data backup are it usually takes much less time, requires much less storage space and it is much easier to restore individual files and folders.
2. Methods to Backup Virtual Machines
(1) Backup the host server with Image Backup
This probably is the simplest method. If you back up the entire host server, then all guest VMs are also backed up. Since the virtual disk files are constantly being modified by the guest OS, the backup software needs to take a snapshot of the virtual disk files. With Volume Shadow Copy technology, this is not a problem.
The real disadvantages are very similar to what mentioned in Section 1.1. Because one host server can host many VMs, some of the disadvantages become even harder to cope with.
(2) Backup the virtual machine configuration files and virtual hard disks (VHDs)
Compared with method (1), this method does not need to backup the host server itself. If there is only one VM hosted by the host server, the backup speed can be up to 100% faster. If the server hosts many VMs, the speed gain is less significant.
If the host server is used solely for hosting the guest VMs, then backing up the host server itself is not important. Instead of backing up the host server using Image Backup, the host server can be easily restored using the recovery disk without any backup.
Another option is to create an Image Backup to only backup the host server without the guest VMs. Because the files on the host server does not really affect the real data, it only needs to be backed up one-time or at a much lower frequency (e.g. once per year or once per month).
(3) Backup a virtual machine using Image Backup;
In one aspect, a virtual machine is just like a physical machine. So it can also be backed up using regular Image Backup. As the virtual machine’s disk image files are actually the VHD files, there is little difference between Method (3) and method (2). Of course if the Virtualization software does not support VSS, then Method (3) needs to be used. Luckily, all mainstream virtualization solution supports VSS.
Method (2) works better from manageability point of view: instead of creating backup tasks in each guest VM, the backup can be performed centrally from the host server.
Method (3) works better from portability point of view. A virtual machine might be moved from one host server to another host server. Nothing needs to be changed if Method (3) is used and the backup destination is a network mapped drive.
(4) Backup data files in a virtual machine;
Two of the main advantages of virtualization are “fast provisioning” and “portability”. With these two advantages, Image Backup is generally not needed:
First of all, the host server is relegated to a bare-bone standard configuration with just the operating system and the virtualization software. Instead of restoring the host server, it can be easily restored using the recovery disk without backup. No user data or business data will be lost.
Secondly, the guest OS can also be easily restored without backup. In fact, a pre-provisioned guest OS loaded with company-specific software can be running (or backed up) as a standby VM. In case the main guest VM fails, instead of restoring the entire guest VM, you only need to boot the standby VM and restore the user data to the standby VM. This can dramatically reduce the system down-time.
With the above analysis, given a little bit of preparation, Image Backup is generally not needed to backup your Host Servers and Virtual Machines – you might still need image backup, but usually it is just one-time backup, or recurring backup at a very low frequency.
As mentioned above, Data Backup is far more efficient and flexible than Image Backup.
3. The Challenges to Backup Virtual Machines to the Cloud
Backing up virtual machines to the cloud poses another set of challenges. Using Image Backup, it could create very huge backup files – a single file can be as large as 200GB or even larger. Even if you only backup VHD files, the VHD files can also be very large – it can be as large as 100GB or even larger. When a single file gets so large, it becomes hard to upload, download and manipulate. Eventually, it will be too inefficient or unreliable to backup such files online.
To mitigate the above problem, incremental backup can be used. However, incremental backup must be extremely carefully designed and tested to work well with cloud-based backup. Even so, the results may vary dramatically based on the real scenario.
Incremental backup can reduce the amount of data to be backed up in many cases; however, it has the following disadvantages:
- It adds more steps to the backup and restore process making it more error-prone;
- In certain cases it can be extremely inefficient. If the data is changed (e.g. modified, defragmented, reorganized, re-ordered, re-indexed or compacted) very frequently, then incremental backup may even increase the amount of data to be transferred.
- In certain cases where an operation may result in thousands of small changes in a VHD file or an Image Backup file, uploading and applying thousands of changes could take a very long time based on the type of changes, esp. if such changes require moving a big chunk of data from one location to another location, or result in the total file size being changed. Moreover, any failure in applying an incremental change will render the entire file corrupted.
- It uses more CPU resources to calculate the incremental changes to a file using some rolling checksum algorithm.
- In certain cases, the incremental backups are not merged with the previous full backup. In this case, it will actually increase the total storage usage, and increase both the time and steps to restore.
4. The DriveHQ Solution
DriveHQ has successfully backed VMs for many large enterprise clients. To make the backup process efficient and smooth, we recommend the following best practices. While these are not necessarily required, they can certainly optimize the performance, reliability and dramatically lower the cost.
4.1 Prepare your Virtual Machine
We recommend separating the OS volume, application volume and data volume, creating three VHDs for OS, apps and data. At a minimum, we recommend creating two VHDs, one for the OS, the other one for the apps and data; or one for the OS and apps, and the other one for the data.
Moving the memory swap file or page file to the apps or data volume can help minimize the amount of data on the OS volume.
With the above approach, you can minimize the VHD file size for the OS volume. For Windows XP or Windows Server 2003, the OS volume only needs about 16GB or even smaller; for Windows 7 / Server 2008 R2, the OS volume only needs about 32GB or even smaller.
4.2. Backup the Virtual Machine Operating System
With proper preparation, the VHD file containing the operating system can be relatively small (less than 32GB). The file can be easily backed up locally, or backed up to the Cloud Storage. This VHD only needs to be backed once as the OS rarely changes or such changes will not affect your backup or restore.
Backup of the OS VHD may not be necessary as you can reinstall the OS, or use a recovery disk to restore the OS. On the other hand, having the OS VHD handy can reduce the time to restore the OS.
If you do decide to backup your VM’s operating system to DriveHQ cloud storage, you just need to run DriveHQ Online Backup client software from the host server, create a backup task using the Database / Email Backup Wizard, then select the backup source folder to backup the virtual machine configuration files and the OS VHD file, but exclude other VHD files. You can exclude other VHD files from the Manage Tasks à Edit Backup Task window.
4.3 Backup the Virtual Machine Software Applications
If the software applications are installed in the same OS VHD, then it is already backed up with the OS.
Assuming you have the software installation disk, then backing up the apps VHD is not necessary. Again, having the apps VHD ready can reduce the time to restore the software apps.
If you do decide to backup your VM’s software applications to DriveHQ cloud storage, you just need to run DriveHQ Online Backup client software from the host server, create a backup task using the Database / Email Backup Wizard, then select the backup source folder to backup the Apps VHD file.
4.4 Backup the Virtual Machine Data Files
In this case, a virtual machine is exactly the same as a physical machine. You just need to:
(1) Install DriveHQ Online Backup on the virtual machine;
(2) Create a backup task(s) to backup the data that needs to be backed up. You can create multiple backup tasks to backup different folders and set different backup schedules
5. Advantages of DriveHQ’s Backup Solution
Compared with other cloud backup solutions, DriveHQ’s Virtual Machine Backup solution has the following advantages:
- Minimize the amount of data to be backed up.
o DriveHQ Online Backup can dramatically reduce the backup time, reduces the bandwidth usage and improves the backup performance and reliability.
- Faster to restore.
o Instead of restoring some huge backup files or VHD files, which could take many days just to download them, enterprise customers can simply restore OS and software apps using the recovery CD or software CDs, or restoring a pre-provisioned OS with software apps already installed, or restore a much smaller OS VHD. The time can be reduced to less than a few hours.
- Lower cost
o DriveHQ Online Backup requires much less storage space, the cost is also dramatically lower.
- Be able to restore individual files and folders
o In case of accidental deletion of a file(s) or folder(s), a user doesn’t need to restore a huge backup file or VHD file. He can simply drag and drop a file(s) or folder(s) from DriveHQ cloud storage to the virtual machine.
- Minimize the chance of data corruption and the impact of data corruption
o Backing up a huge backup file or VHD file is far more error-prone than backing up smaller files. Incremental backup increases such risk by a factor of 10 to 1000. With any single data transfer error / exception or disk error / exception, the entire file will be corrupt. If such corruption is not detected in time, then customers may lose all their data when they need them.
o Using DriveHQ’s solution, the risk of data corruption is extremely low; and in the extremely rare cases, customers may only lose one file.
- More flexible
o A regular Image Backup usually can only be restored to the same physical machine.
o DriveHQ’s Online Backup can be restored anywhere.
o If you move the VM to a different host server, you don’t need to reconfigure your backup task.