How To Copy a Disk Image

1 How To Copy a Disk Image

1.1 The Scenario

Suppose you have several old computers with smallish disks, and you want to make backup copies of everything. This is sometimes called mirroring the disks, or cloning the disks.

We assume you have bought a new large disk, and you want all the backups to go there. So technically at this stage you are not making exact clones; you are just making bit-for-bit copies. On the other hand, when it comes time to do a restore from backup, you will be in a position to make exact clones if you want.

If you are accustomed to using Microsoft tools, you’ve got a problem, because they don’t work very well for this task. In particular, they don’t behave well if they encounter an open file, and there are always open files on your disk if you’re running Windows.

You can solve this problem by making yourself a Knoppix CD. See reference 1. That provides a complete Linux system all on one CD. Download the latest Knoppix CD image (in .iso format) and burn it to CD. Then boot directly from that CD and use the Linux tools to make your backups. (Note that during the backup process, Windows is not running, so there are no open files. You are guaranteed a “clean” backup.)

On the Knoppix download site, the .iso.md5 files are checksums.¹ You can apply the md5sum command to the downloaded .iso image. If the checksum agrees with what the Knoppix site says it should be, you know every byte of the downloaded file is correct. On windows, there are various freeware "md5sum" programs available so you can checksum your download before burning it to CDROM. You can also checksum the resulting CDROM to make sure the burn went OK.

Of course if you are already running Linux, life is even simpler.

If you just want to copy the files, there are easy-to-use commands such as rsync. On the other hand, sometimes you want an exact low-level bit-for-bit copy of the disks, preserving not just the files but the exact layout of the files on disk, including deleted files, hidden files, swap areas, and everything else. The latter scenario is the main topic of this document.

1.2 Rationale

There are numerous reasons why you might want to make low-level bit-for-bit copies.

Suppose you have a disk with an invalid filestructure on one of the partitions. This can happen due to a software error in the operating system, and it can happen if the hardware is powered down at an inopportune time (on a system that isn’t smart enough to implement a journaling filesystem), and it can happen due to bad blocks on the disk surface.
In such a case, the first thing you should do is make a high-quality bit-for-bit copy of the disk. This is important, because the programs that repair filestructures have a tendency to sacrifice some files. Sometimes they sacrifice only as much as necessary, but sometimes seemingly more than necessary. If you have a backup, you can try one recovery program, and then (if you don’t like the results) try another.
Also, if there are bad blocks due to a deteriorating disk surface, the problem is likely to get worse, possibly soon, possibly catastrophically.
If you need to recover a deleted file, stop using the disk as soon as possible. Make a backup, and then search the backup for the deleted information.
The longer you use the disk before making the backup, the less your chance of recovering the deleted file.
If the disk is infected with a virus or the like, you should stop using it as soon as possible. Make a backup so that the nature of the virus can be investigated. Do not trust the infected machine to make a reliable backup of itself. This is an example of a “forensic” backup.
There are other reasons why you might need a forensic backup, for instance if you think a person (rather than a virus) has been doing naughty things with the computer. You may need to look for hidden files, deleted files, directory information such as file ownership, and other nits that are not captured if you just copy files in the prosaic way.

2 Partitions

If you’ve never thought about disk partitions, now is the time to get with the program.

Here is slightly simplified view of how a hard disk is laid out. This is standard across all reasonable operating systems and also Microsoft.

Preamble, containing typically about 63 "hidden sectors", most of which are used for boot code. The first of these sectors is called the MBR (master boot record, as described in reference 2) and contains:
- the first smidgen of boot code
- the partition table
- the volume label
Partition 1
Partition 2
Partition 3
et cetera.

The whole disk – i.e. all of the above – is called the volume.

Internal hard disks connected via an IDE cable (ATA interface) will be names as follows:

/dev/hda        first cable, master drive

/dev/hdb        first cable, slave drive

/dev/hdc        second cable, master drive

/dev/hdd        second cable, slave drive

Often the CDROM is connected via an ATA interface and looks like /dev/hdc. Don’t worry about it.

Hard disks connected via SCSI cables or USB cables will be called /dev/sda, /dev/sdb et cetera, "sd" standing for "SCSI DISK". USB is certainly not the same as SCSI but is handled by the same drivers and inherits the /dev/sd? naming system.

On linux, volumes and partitions are named according to the following scheme:

/dev/hda        the whole volume, hard disk "a"

/dev/hda1        partition #1 on hard disk "a"

/dev/hda2        partition #2 on hard disk "a"

You can verify the layout of your disk with the command:

      sfdisk -uS -l /dev/hda

You can confirm this as follows:

     dd if=/dev/hda skip=0 count=1  2>/dev/null | md5sum -
03103fbfd1eff446f5401c81020c0125  -
     dd if=/dev/hda1 skip=0 count=1 2>/dev/null | md5sum -
7689349395e7f77d7eea90afa575d03a  -
     dd if=/dev/hda skip=63 count=1 2>/dev/null | md5sum -
7689349395e7f77d7eea90afa575d03a  -

Note the agreement between the last two md5sums. Sector 63 of the volume is sector 0 of the first partition. Another edificational command is

        dd if=/dev/hda | file -
/dev/stdin: x86 boot sector; partition 1: ID=0xc, active, starthead 1, startsector 63, 24579387 sectors; partition 2: ID=0xc, starthead 254, startsector 24579450, 8980335 sectors; partition 3: ID=0x5, starthead 254, startsector 33559785, 122736600 sectors, code offset 0x48
        dd if=/dev/hda1 | file -
/dev/stdin: x86 boot sector, Microsoft Windows XP Bootloader NTLDR, code offset 0x58, OEM-ID "MSWIN4.1", sectors/cluster 64, reserved sectors 110, Media descriptor 0xf8, heads 240, hidden sectors 63, sectors 24579387 (volumes > 32 MB) , FAT (32 bit), sectors/FAT 3001, rootdir cluster 9901, reserved3 0x800000, serial number 0x344010f6, unlabeled
        dd if=/dev/hda6 | file -
/dev/stdin: Linux rev 1.0 ext3 filesystem data (needs journal recovery)

which tells you in words what type of sector is found at the given location on the disk.

Microsoft users typically don’t use more than one of the partitions on a given disk. Nowadays many manufacturers put a small, semi-secret “recovery” system in the first partition. Then they assign almost all of the remaining disk space to the second partition, and use that as the MS “C:” drive. That is to say, typically the Linux /dev/hda2 partition corresponds to the entire “C:” drive.

A volume or a partition, as implemented by the /dev/??? interface, is just a bucket of bits. No filestructure is imposed at this level.

3 Mounting A Partition

If you want a filestructure, you need to

create a filestructure on the given partition (using the mkfs command), or copy one there (using dd), and then
mount the partition on a suitable mount-point, using the "mount" command.

Any empty² directory can serve as mount point. Example:

       mount /dev/sda1 /bob
       mount /dev/sda2 /carol
       mount /dev/sda3 /ted

However it is conventional and sensible, for present purposes, to use less colorful and more systematic names, to wit:

       mount /dev/sda1 /mnt/sda1
       mount /dev/sda2 /mnt/sda2
       mount /dev/sda3 /mnt/sda3

And at this point, if you look in (say) /mnt/sda1 you will see filestructure (files, directories, et cetera). You must NEVER write to the partition’s bit-bucket interface (e.g. /dev/sda1) while the partition is mounted.

4 Partition-to-File Backups

Consider the following contrast:

Partition-to-Partition Backup

Partition-to-File Backup

You need to adjust “by hand” the size of the partition on the backup disk, so that it matches the size of the partition on the old disk. It is not always possible to do this exactly.

The file system will make the backup file the right size, automatically.

You can’t easily give partitions mnemonic names, or group them into directories and subdirectories.

You can give nice mnemonic names to file, and group them in to directories and subdirectories.

There is no way to associate dates, ownership, or permissions with partitions.

Files have dates, ownership, permissions, et cetera.

If the backup disk has N partitions, you can’t recover unused space by deleting partitions you’re not using, except the Nth one, without the risk of severe fragmentation.

You can delete files at will, and the space will be recovered.

Dealing with partitions is laborious, unfamiliar, and error-prone.

Dealing with files is convenient, familiar, and much less error-prone.

If you do everything just right with a partition-to-partition copy, it is maximally efficient.

A partition-to-disk copy incurs small amounts of overhead, such as a few percent increase in disk space required.

You can mount the backup partition and peruse the files therein.

Using the loopback device, you can mount the backup file and peruse the files therein.

So you can see that there is a rather strong argument for doing partition-to-file backups.

5 Procedure

Before doing anything irreversible, we need to do some reconnaisance. To obtain a listing of all disks attached to the system, try this:

        sfdisk -uS  -l 2>/dev/null |  grep ^Disk

Then use the cfdisk command to look at each of the disks in turn. For example, if sda exists, it’s a good place to start:

        cfdisk /dev/sda

This disk might be your external backup disk – or it might be the hard disk inside your computer. You should be able to figure out what’s what by looking at the names and sizes of the partitions.

If /dev/sda turns out to be the internal hard drive, it is likely that /dev/sdb is the external backup disk. Conversely, if /dev/sda turns out to be the external backup disk, it is likely that /dev/hda is the internal hard drive. Use cfdisk to make sure.

Once you know what’s what, the next step is to partition the backup disk. By way of example, suppose each of your old disks is 200 gigs, and suppose your backup drive is 500 gigs. You can’t put three of the old disks on the backup disk, but you can put two with room left over.

To preserve the generality (and reliability) of this document, we are going to call the backup volume $bk. To partition the backup disk, again apply the cfdisk command to that disk:

        bk=sda   # for example;  change this to suit your system
        cfdisk /dev/$bk

Ordinarily I wouldn’t recommend creating any huge partitions, but in this case huge is appropriate, because of the huge size of the files we will be creating. You can create one giant partition that covers all the available space on the backup disk. Or you can create a few reasonable-sized partitions at the front of the backup disk, for whatever purpose (perhaps for installing an operating system, if you want to make the backup disk bootable) ... and then create one more partition, a huge partition, with all the remaining space. Make it big enough to hold your old data, plus a few percent for overhead. In our example, 420 GB should be fine.

While still in cfdisk, use the T command to set the type of the new backup partition to type 83 (standard Linux filesystem). Use the W command (capital W) to write your handiwork out to the disk. Confirm with “yes” spelled out with three letters. (If you try to use “y” as an abbreviation for “yes” it will not be accepted.)

We hereby assign the name $bkp to the backup partition. For example, if it is partition #1 on /dev/$bk, the definition is:

        bkp=${bk}1   # for example;  change this to suit your system

At this point, the /dev/$bkp partition is just a big space on the disk, full of uninitialized garbage. You need format it to make it usable as a filesystem. Use the mkfs command (“make filesystem”) – or better yet the mkfs.ext3 command – which will create a new, empty filesystem on the new partition:

    oldhost="cassandra"
    mkfs.ext3 /dev/$bkp
    e2label /dev/$bkp "$oldhost backup"

Then you can mount the partition. Let’s assume that knoppix has already created³ the mountpoint (/mnt/$bkp) that we want to use.

    mount /dev/$bkp /mnt/$bkp

At this point, you can copy the old /dev/$oldp partition (as opposed to the /dev/$old volume) to a file on the backup disk. Choose an informative filename; for example, if the old computer was named cassandra, the command might be

   old=hda         # for example;  change this to suit your system
   oldp=${old}1    # for example;  change this to suit your system
   dd if=/dev/$oldp of=/mnt/$bkp/$oldhost-$oldp.ntfs

If there are other parititions, the commands to copy them are analogous.

It wouldn’t hurt to capture the disk partition information in an easy-to-read file:

      sfdisk -uS -l /dev/$old | tee /mnt/$bkp/$oldhost.partition

Last but not least, you need to copy the MBR and the other stuff that comes before the first partition:

   dd if=/dev/$oldp  count=63  of=/mnt/$bkp/$oldhost.mbr

Make sure the “63” in the previous line agrees with the actual size of the preamble on your old disk, as determined by the aforementioned sfdisk -uS -l /dev/$old command.

As an alternative, if you’re really lazy, you can just copy the entire old volume /dev/$old to a file on the backup disk. If you do this, you will need to use the "losetup -o 32256" option when perusing the files, as discussed in section 6. Hint: 512×63=32256.

6 Perusing the Backed-Up Files

You may want to look at the files on the backed-up partition. Here is one way to do it. Let’s assume loopback device /dev/loop0 is available. On some systems you can check what loopback device is available with losetup -f and on other systems you can check which loopback devices are in use with losetup -a but we are just going to assume /dev/loop0 is available. It really should be; if you’ve been using it you should know you’ve been using it.

## Take the backup file we just created,
## and attach it to the loopback device:
 losetup /dev/loop0 /mnt/sda1/cassandra-hda1.ntfs
## Choose a suitable directory name, and
## create an empty dir to use as mountpoint:
 install -d /mnt/$oldhost-$oldp
 mount -o ro /dev/loop0 /mnt/$oldhost-$oldp
  # peruse all your files on /mnt/$oldhost-$oldp
  # then, when you are through playing,
  # put away all the toys:
 umount /dev/loop0
 losetup -d /dev/loop0  ## release the loop device

Note that when you dismount a partition (using the umount command) the mountpoint goes back to being a plain old directory. There is no need to delete it; having a few empty directories lying around is harmless.

7 Restoring from Backup

Making a backup isn’t very useful unless you know how to restore things from backup.

If you just want to restore a file or two, the methods of section 6 are probably what you want.

On the other hand, if you want to restore everything, so as to create an exact clone of the disk you started with, here is the procedure.

Beware that from now on, most of the commands will destroy whatever information was previously on hda. Be careful. Constructive suggestion: have two windows open, one logged in as root, and the other logged in as some unprivileged account (user joe, group joe). root privileges to change the group permissions so that joe can write to /dev/hda without giving him permission to write to any of the other disks.

First, copy the .mbr file from the backup disk to the clone-to-be ... and then notify the operating system that the partition table has changed:

       new=hda            # for example
       newp=${new}1       # for example
       dd if=/mnt/$bkp/$oldhost.mbr of=/dev/$new
       sfdisk -R /dev/$new # re-read partition table

Now you can restore the partitions one by one in the usual way:

       dd if=/mnt/$bkp/$oldhost-$oldp.ntfs of=/dev/$newp

8 References

1.: Welcome to Knoppix http://www.knoppix.net/
2.: Wikipedia Article: “Master Boot Record” http://en.wikipedia.org/wiki/Master_boot_record

1: Beware that md5 is not secure. SHA-1 should be used instead.
2: Actually a non-empty directory will work, but I don’t recommend it, because the files that were in that directory will become inaccessible until the device is unmounted.
3: If the mountpoint does not already exist, you can create it using install -d /mnt/$bkp or equivalently a string of mkdir commands.

[Contents]

/dev/hda		first cable, master drive
/dev/hdb		first cable, slave drive
/dev/hdc		second cable, master drive
/dev/hdd		second cable, slave drive

/dev/hda		the whole volume, hard disk "a"
/dev/hda1		partition #1 on hard disk "a"
/dev/hda2		partition #2 on hard disk "a"