Suppose you have several old computers with smallish disks, and you want to make backup copies of everything. This is sometimes called mirroring the disks, or cloning the disks.
We assume you have bought a new large disk, and you want all the backups to go there. So technically at this stage you are not making exact clones; you are just making bit-for-bit copies. On the other hand, when it comes time to do a restore from backup, you will be in a position to make exact clones if you want.
If you are accustomed to using Microsoft tools, you’ve got a problem, because they don’t work very well for this task. In particular, they don’t behave well if they encounter an open file, and there are always open files on your disk if you’re running Windows.
You can solve this problem by making yourself a Knoppix CD. See reference 1. That provides a complete Linux system all on one CD. Download the latest Knoppix CD image (in .iso format) and burn it to CD. Then boot directly from that CD and use the Linux tools to make your backups. (Note that during the backup process, Windows is not running, so there are no open files. You are guaranteed a “clean” backup.)
On the Knoppix download site, the .iso.md5 files are checksums.1 You can apply the md5sum command to the downloaded .iso image. If the checksum agrees with what the Knoppix site says it should be, you know every byte of the downloaded file is correct. On windows, there are various freeware "md5sum" programs available so you can checksum your download before burning it to CDROM. You can also checksum the resulting CDROM to make sure the burn went OK.
Of course if you are already running Linux, life is even simpler.
If you just want to copy the files, there are easy-to-use commands such as rsync. On the other hand, sometimes you want an exact low-level bit-for-bit copy of the disks, preserving not just the files but the exact layout of the files on disk, including deleted files, hidden files, swap areas, and everything else. The latter scenario is the main topic of this document.
There are numerous reasons why you might want to make low-level bit-for-bit copies.
In such a case, the first thing you should do is make a high-quality bit-for-bit copy of the disk. This is important, because the programs that repair filestructures have a tendency to sacrifice some files. Sometimes they sacrifice only as much as necessary, but sometimes seemingly more than necessary. If you have a backup, you can try one recovery program, and then (if you don’t like the results) try another.
Also, if there are bad blocks due to a deteriorating disk surface, the problem is likely to get worse, possibly soon, possibly catastrophically.
The longer you use the disk before making the backup, the less your chance of recovering the deleted file.
If you’ve never thought about disk partitions, now is the time to get with the program.
Here is slightly simplified view of how a hard disk is laid out. This is standard across all reasonable operating systems and also Microsoft.
The whole disk – i.e. all of the above – is called the volume.
Internal hard disks connected via an IDE cable (ATA interface) will be names as follows:
/dev/hda first cable, master drive /dev/hdb first cable, slave drive /dev/hdc second cable, master drive /dev/hdd second cable, slave drive
Often the CDROM is connected via an ATA interface and looks like /dev/hdc. Don’t worry about it.
Hard disks connected via SCSI cables or USB cables will be called /dev/sda, /dev/sdb et cetera, "sd" standing for "SCSI DISK". USB is certainly not the same as SCSI but is handled by the same drivers and inherits the /dev/sd? naming system.
On linux, volumes and partitions are named according to the following scheme:
/dev/hda the whole volume, hard disk "a" /dev/hda1 partition #1 on hard disk "a" /dev/hda2 partition #2 on hard disk "a"
You can verify the layout of your disk with the command:
sfdisk -uS -l /dev/hda
You can confirm this as follows:
dd if=/dev/hda skip=0 count=1 2>/dev/null | md5sum - 03103fbfd1eff446f5401c81020c0125 - dd if=/dev/hda1 skip=0 count=1 2>/dev/null | md5sum - 7689349395e7f77d7eea90afa575d03a - dd if=/dev/hda skip=63 count=1 2>/dev/null | md5sum - 7689349395e7f77d7eea90afa575d03a -
Note the agreement between the last two md5sums. Sector 63 of the volume is sector 0 of the first partition. Another edificational command is
dd if=/dev/hda | file - /dev/stdin: x86 boot sector; partition 1: ID=0xc, active, starthead 1, startsector 63, 24579387 sectors; partition 2: ID=0xc, starthead 254, startsector 24579450, 8980335 sectors; partition 3: ID=0x5, starthead 254, startsector 33559785, 122736600 sectors, code offset 0x48 dd if=/dev/hda1 | file - /dev/stdin: x86 boot sector, Microsoft Windows XP Bootloader NTLDR, code offset 0x58, OEM-ID "MSWIN4.1", sectors/cluster 64, reserved sectors 110, Media descriptor 0xf8, heads 240, hidden sectors 63, sectors 24579387 (volumes > 32 MB) , FAT (32 bit), sectors/FAT 3001, rootdir cluster 9901, reserved3 0x800000, serial number 0x344010f6, unlabeled dd if=/dev/hda6 | file - /dev/stdin: Linux rev 1.0 ext3 filesystem data (needs journal recovery)
which tells you in words what type of sector is found at the given location on the disk.
Microsoft users typically don’t use more than one of the partitions on a given disk. Nowadays many manufacturers put a small, semi-secret “recovery” system in the first partition. Then they assign almost all of the remaining disk space to the second partition, and use that as the MS “C:” drive. That is to say, typically the Linux /dev/hda2 partition corresponds to the entire “C:” drive.
A volume or a partition, as implemented by the /dev/??? interface, is just a bucket of bits. No filestructure is imposed at this level.
If you want a filestructure, you need to
Any empty2 directory can serve as mount point. Example:
mount /dev/sda1 /bob mount /dev/sda2 /carol mount /dev/sda3 /ted
However it is conventional and sensible, for present purposes, to use less colorful and more systematic names, to wit:
mount /dev/sda1 /mnt/sda1 mount /dev/sda2 /mnt/sda2 mount /dev/sda3 /mnt/sda3
And at this point, if you look in (say) /mnt/sda1 you will see filestructure (files, directories, et cetera). You must NEVER write to the partition’s bit-bucket interface (e.g. /dev/sda1) while the partition is mounted.
Consider the following contrast:
Partition-to-Partition Backup | Partition-to-File Backup |
You need to adjust “by hand” the size of the partition on the backup disk, so that it matches the size of the partition on the old disk. It is not always possible to do this exactly. | The file system will make the backup file the right size, automatically. |
You can’t easily give partitions mnemonic names, or group them into directories and subdirectories. | You can give nice mnemonic names to file, and group them in to directories and subdirectories. |
There is no way to associate dates, ownership, or permissions with partitions. | Files have dates, ownership, permissions, et cetera. |
If the backup disk has N partitions, you can’t recover unused space by deleting partitions you’re not using, except the Nth one, without the risk of severe fragmentation. | You can delete files at will, and the space will be recovered. |
Dealing with partitions is laborious, unfamiliar, and error-prone. | Dealing with files is convenient, familiar, and much less error-prone. |
If you do everything just right with a partition-to-partition copy, it is maximally efficient. | A partition-to-disk copy incurs small amounts of overhead, such as a few percent increase in disk space required. |
You can mount the backup partition and peruse the files therein. | Using the loopback device, you can mount the backup file and peruse the files therein. |
So you can see that there is a rather strong argument for doing partition-to-file backups.
Before doing anything irreversible, we need to do some reconnaisance. To obtain a listing of all disks attached to the system, try this:
sfdisk -uS -l 2>/dev/null | grep ^Disk
Then use the cfdisk command to look at each of the disks in turn. For example, if sda exists, it’s a good place to start:
cfdisk /dev/sda
This disk might be your external backup disk – or it might be the hard disk inside your computer. You should be able to figure out what’s what by looking at the names and sizes of the partitions.
If /dev/sda turns out to be the internal hard drive, it is likely that /dev/sdb is the external backup disk. Conversely, if /dev/sda turns out to be the external backup disk, it is likely that /dev/hda is the internal hard drive. Use cfdisk to make sure.
Once you know what’s what, the next step is to partition the backup disk. By way of example, suppose each of your old disks is 200 gigs, and suppose your backup drive is 500 gigs. You can’t put three of the old disks on the backup disk, but you can put two with room left over.
To preserve the generality (and reliability) of this document, we are going to call the backup volume $bk. To partition the backup disk, again apply the cfdisk command to that disk:
bk=sda # for example; change this to suit your system cfdisk /dev/$bk
Ordinarily I wouldn’t recommend creating any huge partitions, but in this case huge is appropriate, because of the huge size of the files we will be creating. You can create one giant partition that covers all the available space on the backup disk. Or you can create a few reasonable-sized partitions at the front of the backup disk, for whatever purpose (perhaps for installing an operating system, if you want to make the backup disk bootable) ... and then create one more partition, a huge partition, with all the remaining space. Make it big enough to hold your old data, plus a few percent for overhead. In our example, 420 GB should be fine.
While still in cfdisk, use the T command to set the type of the new backup partition to type 83 (standard Linux filesystem). Use the W command (capital W) to write your handiwork out to the disk. Confirm with “yes” spelled out with three letters. (If you try to use “y” as an abbreviation for “yes” it will not be accepted.)
We hereby assign the name $bkp to the backup partition. For example, if it is partition #1 on /dev/$bk, the definition is:
bkp=${bk}1 # for example; change this to suit your system
At this point, the /dev/$bkp partition is just a big space on the disk, full of uninitialized garbage. You need format it to make it usable as a filesystem. Use the mkfs command (“make filesystem”) – or better yet the mkfs.ext3 command – which will create a new, empty filesystem on the new partition:
oldhost="cassandra" mkfs.ext3 /dev/$bkp e2label /dev/$bkp "$oldhost backup"
Then you can mount the partition. Let’s assume that knoppix has already created3 the mountpoint (/mnt/$bkp) that we want to use.
mount /dev/$bkp /mnt/$bkp
At this point, you can copy the old /dev/$oldp partition (as opposed to the /dev/$old volume) to a file on the backup disk. Choose an informative filename; for example, if the old computer was named cassandra, the command might be
old=hda # for example; change this to suit your system oldp=${old}1 # for example; change this to suit your system dd if=/dev/$oldp of=/mnt/$bkp/$oldhost-$oldp.ntfs
If there are other parititions, the commands to copy them are analogous.
It wouldn’t hurt to capture the disk partition information in an easy-to-read file:
sfdisk -uS -l /dev/$old | tee /mnt/$bkp/$oldhost.partition
Last but not least, you need to copy the MBR and the other stuff that comes before the first partition:
dd if=/dev/$oldp count=63 of=/mnt/$bkp/$oldhost.mbr
Make sure the “63” in the previous line agrees with the actual size of the preamble on your old disk, as determined by the aforementioned sfdisk -uS -l /dev/$old command.
As an alternative, if you’re really lazy, you can just copy the entire old volume /dev/$old to a file on the backup disk. If you do this, you will need to use the "losetup -o 32256" option when perusing the files, as discussed in section 6. Hint: 512×63=32256.
You may want to look at the files on the backed-up partition. Here is one way to do it. Let’s assume loopback device /dev/loop0 is available. On some systems you can check what loopback device is available with losetup -f and on other systems you can check which loopback devices are in use with losetup -a but we are just going to assume /dev/loop0 is available. It really should be; if you’ve been using it you should know you’ve been using it.
## Take the backup file we just created, ## and attach it to the loopback device: losetup /dev/loop0 /mnt/sda1/cassandra-hda1.ntfs ## Choose a suitable directory name, and ## create an empty dir to use as mountpoint: install -d /mnt/$oldhost-$oldp mount -o ro /dev/loop0 /mnt/$oldhost-$oldp # peruse all your files on /mnt/$oldhost-$oldp # then, when you are through playing, # put away all the toys: umount /dev/loop0 losetup -d /dev/loop0 ## release the loop device
Note that when you dismount a partition (using the umount command) the mountpoint goes back to being a plain old directory. There is no need to delete it; having a few empty directories lying around is harmless.
Making a backup isn’t very useful unless you know how to restore things from backup.
If you just want to restore a file or two, the methods of section 6 are probably what you want.
On the other hand, if you want to restore everything, so as to create an exact clone of the disk you started with, here is the procedure.
Beware that from now on, most of the commands will destroy whatever information was previously on hda. Be careful. Constructive suggestion: have two windows open, one logged in as root, and the other logged in as some unprivileged account (user joe, group joe). root privileges to change the group permissions so that joe can write to /dev/hda without giving him permission to write to any of the other disks.
First, copy the .mbr file from the backup disk to the clone-to-be ... and then notify the operating system that the partition table has changed:
new=hda # for example newp=${new}1 # for example dd if=/mnt/$bkp/$oldhost.mbr of=/dev/$new sfdisk -R /dev/$new # re-read partition table
Now you can restore the partitions one by one in the usual way:
dd if=/mnt/$bkp/$oldhost-$oldp.ntfs of=/dev/$newp