Rebooting: mjg59

mjg59

Rebooting

May 31, 2011 12:43

You'd think it'd be easy to reboot a PC, wouldn't you? But then you'd also think that it'd be straightforward to convince people that at least making some effort to be nice to each other would be a mutually beneficial proposal, and look how well that's worked for us.

Linux has a bunch of different ways to reset an x86. Some of them are 32-bit only and so I'm just going to ignore them because honestly just what are you doing with your life. Also, they're horrible. So, that leaves us with five of them.

kbd - reboot via the keyboard controller. The original IBM PC had the CPU reset line tied to the keyboard controller. Writing the appropriate magic value pulses the line and the machine resets. This is all very straightforward, except for the fact that modern machines don't have keyboard controllers (they're actually part of the embedded controller) and even more modern machines don't even pretend to have a keyboard controller. Now, embedded controllers run software. And, as we all know, software is dreadful. But, worse, the software on the embedded controller has been written by BIOS authors. So clearly any pretence that this ever works is some kind of elaborate fiction. Some machines are very picky about hardware being in the exact state that Windows would program. Some machines work 9 times out of 10 and then lock up due to some odd timing issue. And others simply don't work at all. Hurrah!
triple - attempt to generate a triple fault. This is done by loading an empty interrupt descriptor table and then calling int(3). The interrupt fails (there's no IDT), the fault handler fails (there's no IDT) and the CPU enters a condition which should, in theory, then trigger a reset. Except there doesn't seem to be a requirement that this happen and it just doesn't work on a bunch of machines.
pci - not actually pci. Traditional PCI config space access is achieved by writing a 32 bit value to io port 0xcf8 to identify the bus, device, function and config register. Port 0xcfc then contains the register in question. But if you write the appropriate pair of magic values to 0xcf9, the machine will reboot. Spectacular! And not standardised in any way (certainly not part of the PCI spec), so different chipsets may have different requirements. Booo.
efi - EFI runtime services provide an entry point to reboot the machine. It usually even works! As long as EFI runtime services are working at all, which may be a stretch.
acpi - Recent versions of the ACPI spec let you provide an address (typically memory or system IO space) and a value to write there. The idea is that writing the value to the address resets the system. It turns out that doing so often fails. It's also impossible to represent the PCI reboot method via ACPI, because the PCI reboot method requires a pair of values and ACPI only gives you one.

Now, I'll admit that this all sounds pretty depressing. But people clearly sell computers with the expectation that they'll reboot correctly, so what's going on here?

A while back I did some tests with Windows running on top of qemu. This is a great way to evaluate OS behaviour, because you've got complete control of what's handed to the OS and what the OS tries to do to the hardware. And what I discovered was a little surprising. In the absence of an ACPI reboot vector, Windows will hit the keyboard controller, wait a while, hit it again and then give up. If an ACPI reboot vector is present, windows will poke it, try the keyboard controller, poke the ACPI vector again and try the keyboard controller one more time.

This turns out to be important. The first thing it means is that it generates two writes to the ACPI reboot vector. The second is that it leaves a gap between them while it's fiddling with the keyboard controller. And, shockingly, it turns out that on most systems the ACPI reboot vector points at 0xcf9 in system IO space. Even though most implementations nominally require two different values be written, it seems that this isn't a strict requirement and the ACPI method works.

3.0 will ship with this behaviour by default. It makes various machines work (some Apples, for instance), improves things on some others (some Thinkpads seem to sit around for extended periods of time otherwise) and hopefully avoids the need to add any more machine-specific quirks to the reboot code. There's still some divergence between us and Windows (mostly in how often we write to the keyboard controller), which can be cleaned up if it turns out to make a difference anywhere.

Now. Back to EFI bugs.

(

Comments |Comment on this)

advogato, fedora