Search This Blog

Thursday, May 27, 2010

Common boot problems

Can't Boot?

Watch the system closely as it boots, and take note of any error messages that appear. If the system complains that it is unable to mount the root filesystem, for example, this can be for any of several reasons:


* The BIOS cannot find the boot loader. This sometimes happens after you've installed Linux to dual-boot with Windows, but - out of concern
to not misconfigure the system - have asked the install program to place the boot loader in the Linux root (or /boot) filesystem.
The problem is that the BIOS can't see it there, unless you make that the active partition. The simplest fix is to reinstall
Linux and this time, let it place the LILO or GRUB boot loader into the Master Boot Record - don't worry, the Linux boot loaders are
automatically set up to let you choose Linux or Windows at boot time. It is possible to perform a more complex fix, for example by
copying the Linux boot loader sector into a file, and setting up the Windows NT/2K/XP boot loader to chain to it - but that is too
complex to describe here
(see http://www.lesbell.com.au/Home.nsf/web/Using+the+NT+Boot+Loader+to+Boot+Linux?OpenDocument where you'll find a longer article
describing how to use the NT boot loader to boot Linux).
* The kernel doesn't have a device driver to access the hard drive (e.g. a SCSI drive). Fix this by using the mkinitrd script to build a new
initrd file that contains the correct drivers, or recompile the kernel to include the driver code. This usually happens because you've
built a new kernel and slightly messed up the configuration.
* The kernel doesn't have a filesystem driver to access the root partition. For example, if the root filesystem is formatted with ext3,
then you will need the ext3 and jbd modules in the initrd or compiled into the kernel. Fix as for the previous problem. Again, this
usually happens after building a new kernel.
* The partition table has been modified, for example, by the installation of another operating system. In this case, edit the kernel
command line (in /ec/lilo.conf or /boot/grub/menu.lst) and the contents of /etc/fstab to contain the correct entries.
* Filesystems are corrupted, due to a power failure or system crash. Generally, after a system crash or power outage (what? No UPS?),
the system will come up and repair itself. If you are using a journalling filesystem like ext3fs, jfs, xfs or resiserfs, it will usually
perform a roll-forward recovery from its journal file and carry on. Even with the older ext2fs, the system usually runs an fsck
(file system check) on the various file systems and repairs them automatically. However, just occasionally manual intervention is
required - ; you might have to answer 'Y' to a string of questions (answering 'N' will get you nowhere unless you intend to
perform really low-level repairs yourself in a last-ditch attempt to avoid data loss). In the worst case, you might have to reboot from
rescue media and manuall run the e2fsck (or similar) command against each filesystem in turn. For example:

# e2fsck -p /dev/hda7

If the program complains that the superblock - the master block that links to everything else - is corrupted, it is useful to
remember that the superblock is so critical that it is duplicated every 8192 blocks through the filesystem and you can tell e2fsck to
use one of the backups:

# e2fsck -b 8193 /dev/hda7
* One or more filesystems cannot be found and mounted: Check the contents of /etc/fstab - in making quick alterations here, typographical
errors are common. You can use the e2label command to view the label of each filesystem: some distributions set these to the mount point
so you can figure out what is what.

No comments: