A keen Gentoo user (Ronald) recently came onto IRC, with a strange kernel problem: 2.6.15 only using one of his CPU’s although it sees both, 2.6.14 works fine.
He was keen to investigate the problem himself so I thought now would be a good time to try the git bisect feature for the first time. There were approximately 5500 patches merged between 2.6.14 and 2.6.15, and this method found the offending patch in only 13 reboots (OK – still a high number, I said this guy was keen though!).
This feature is very effective and very easy to use, so I’m going to demonstrate it here. Linus also wrote a nice HOWTO on the topic.
You first have to clone the Linux kernel git repository. This is the same concept checking out a repository from CVS.
# git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-git
# cd linux-git
This directory looks like an ordinary kernel source distribution as you might expect in /usr/src/linux. It also has a hidden .git subdirectory which is where git stores its magical data. Tell git that we want to find a buggy patch through bisection:
# git bisect start
Next, tell git which kernel was the last known-working kernel, and which kernel is known to be not working. To identify these kernels, you can either use the long hexadecimal commit numbers, or you can abbreviate those numbers, or you can refer to tags directly. Linus tags every release with the version number, so the next step is as simple as:
# git bisect bad v2.6.15
# git bisect good v2.6.14
git now runs off and locates the commit exactly halfway between 2.6.14 and 2.6.15. It then “checks out” this tree, so the kernel infront of you is effectively a snapshot from halfway inbetween 2.6.14 and 2.6.15.
Bisecting: 2705 revisions left to test after this
[dd0314f7bb407bc4bdb3ea769b9c8a3a5d39ffd7] fbcon: Initialize new driver when old driver is released
Build that kernel in the normal way, and reboot into it. It works, so we know the bug was introduced after this point. We inform git of this:
# git bisect good
git now discards the whole first half of commits between 2.6.14 and 2.6.15 and bisects the remaining half (the changes after the point we just tested, but before 2.6.15). This is just a simple binary search. git now presents us with a new kernel snapshot (in this case, 3 quarters of the way between 2.6.14 and 2.6.15) and we have to test this.
A couple of kernels later, the search gives us a kernel which exhibits the bug. Telling git about this isn’t any harder:
# git bisect bad
The search continues, with the user telling git if the kernel was “good” or “bad” each time, and several reboots later we end up with the exact patch that introduced the bug:
# git bisect bad
cd8e2b48daee891011a4f21e2c62b210d24dcc9e is first bad commit
diff-tree cd8e2b48daee891011a4f21e2c62b210d24dcc9e (from d2149b542382bfc206cb28485108f6470c979566)
Author: Venkatesh Pallipadi <venkatesh .pallipadi@intel.com>
Date: Fri Oct 21 19:22:00 2005 -0400
[ACPI] fix 2.6.13 boot hang regression on HT box w/ broken BIOS
Ronald filed kernel bug 5930 about this. Usually this stuff is a nightmare to debug, and even though this did require many reboots to locate, it’s definately a step in the right direction. The number of bisections you need to do is obviously less if you have a smaller range to test (e.g. if he’d known that 2.6.15-rc1 was OK and 2.6.15-rc4 was bad, some time would have been saved).