Search This Blog

Wednesday, December 30, 2009

Why could my kernel panic ?

A kernel panic is the most serious form of computer crash. It occurs when the OS kernel detects an internal inconsistency or error that cannot be automatically resolved. Since there is no way for the kernel to reliably request human intervention in these matters, it `panics' and immediately interrupts the normal operation of the computer. What happens next depends on the kernel. Linux throws debugging information onto the screen and waits for a reboot. BeOS and NetBSD dump the user into the kernel debugger, which for the average user is equivalent to the Linux behavior. Windows NT displays the dreaded Blue Screen of Death. The only thing all kernel panics have in common is that the only way to restore normal functionality is to reboot. The actual term `kernel panic' is normally only heard in the context of Unix and Unix-like systems; other systems have other terms for what is essentially the same thing.


There are several ways a kernel panic may be caused. One of the most common is bad hardware. If kernel memory is corrupted by a hardware fault, the kernel will likely panic. Kernel bugs can also cause panics, but bugs that are severe enough to cause panics should not be found in stable kernel versions, and even in beta kernels they should be rare. Panics can also occur during the boot sequence, if the conditions for a successful boot are not met. Under *nix systems the most common of these cases is if the root filesystem is for some reason not mountable. In particular, if the root filesystem is remote and the connection times out, the kernel will panic. Also, if init is not found on the root filesystem, the kernel will panic.


The Linux kernel defines the mechanism for a panic as a function panic() in kernel/panic.c. 416 other source files in 2.4.19 then contain calls to panic(). Here's why kernel subsystems panic:
* Failure of a memory allocation or structure creation which should always succeed
* Unrecoverable filesystem errors. Some filesystems can have their error recovery mechanisms set to panic when a normally non-fatal but
possibly serious error occurs.
* Task exit during an interrupt handler (the infamous Aiee, killing interrupt handler! (see rescdsk's writup there for more info))
* Complete memory exhaustion (different from above, though related)
* Locking failure
* Missing hardware features or serious hardware exceptions
* Interrupt glitches, including sleeping interrupt handlers
* Premature and unexpected destruction of kernel structures
* Failure to load essential drivers
* SMP concurrency errors

No comments: