A keen Gentoo user (Ronald) recently came onto IRC, with a strange kernel problem: 2.6.15 only using one of his CPU’s although it sees both, 2.6.14 works fine.
He was keen to investigate the problem himself so I thought now would be a good time to try the git bisect feature for the first time. There were approximately 5500 patches merged between 2.6.14 and 2.6.15, and this method found the offending patch in only 13 reboots (OK – still a high number, I said this guy was keen though!).
This feature is very effective and very easy to use, so I’m going to demonstrate it here. Linus also wrote a nice HOWTO on the topic.
You first have to clone the Linux kernel git repository. This is the same concept checking out a repository from CVS.
# git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-git # cd linux-git
This directory looks like an ordinary kernel source distribution as you might expect in /usr/src/linux. It also has a hidden .git subdirectory which is where git stores its magical data. Tell git that we want to find a buggy patch through bisection:
# git bisect start
Next, tell git which kernel was the last known-working kernel, and which kernel is known to be not working. To identify these kernels, you can either use the long hexadecimal commit numbers, or you can abbreviate those numbers, or you can refer to tags directly. Linus tags every release with the version number, so the next step is as simple as:
# git bisect bad v2.6.15 # git bisect good v2.6.14
git now runs off and locates the commit exactly halfway between 2.6.14 and 2.6.15. It then “checks out” this tree, so the kernel infront of you is effectively a snapshot from halfway inbetween 2.6.14 and 2.6.15.
Bisecting: 2705 revisions left to test after this [dd0314f7bb407bc4bdb3ea769b9c8a3a5d39ffd7] fbcon: Initialize new driver when old driver is released
Build that kernel in the normal way, and reboot into it. It works, so we know the bug was introduced after this point. We inform git of this:
# git bisect good
git now discards the whole first half of commits between 2.6.14 and 2.6.15 and bisects the remaining half (the changes after the point we just tested, but before 2.6.15). This is just a simple binary search. git now presents us with a new kernel snapshot (in this case, 3 quarters of the way between 2.6.14 and 2.6.15) and we have to test this.
A couple of kernels later, the search gives us a kernel which exhibits the bug. Telling git about this isn’t any harder:
# git bisect bad
The search continues, with the user telling git if the kernel was “good” or “bad” each time, and several reboots later we end up with the exact patch that introduced the bug:
# git bisect bad cd8e2b48daee891011a4f21e2c62b210d24dcc9e is first bad commit diff-tree cd8e2b48daee891011a4f21e2c62b210d24dcc9e (from d2149b542382bfc206cb28485108f6470c979566) Author: Venkatesh Pallipadi <venkatesh .pallipadi@intel.com> Date: Fri Oct 21 19:22:00 2005 -0400 [ACPI] fix 2.6.13 boot hang regression on HT box w/ broken BIOS
Ronald filed kernel bug 5930 about this. Usually this stuff is a nightmare to debug, and even though this did require many reboots to locate, it’s definately a step in the right direction. The number of bisections you need to do is obviously less if you have a smaller range to test (e.g. if he’d known that 2.6.15-rc1 was OK and 2.6.15-rc4 was bad, some time would have been saved).
Probably wouldn’t save that much time though, because of the nature of binary search. Maybe 2 reboots.
I agree, if I was able to shave off 75% of the commit range i would have saved a mere 2 reboots. I think from RC1 to RC4 accounts for more then 25% of the changesets between V14 and v15 releases
The same way counts the other way around, if i had only known 2.6.11 was good and 2.6.15 bad, that would have added a mere 2 more reboots. (every boot rules out around half of the changesets left over)
Now only if we could extend this and get git to use Newton’s method, which has *quadratic* convergence. Unfortunately the kernel isn’t a function, nor can we take it’s derivative.
I started to use this approach to fix these infernal crashes I’ve been experiencing for nearly a year… and I came to the horrible realisation that it wasn’t going to work because the last good kernel was 2.6.9 – before git was created. I’m doomed! I really have tried everything. I’m not alone though, another guy with a similar Vaio has the same problem.
You are not doomed. The old bitkeeper repository has been converted to git and you can find it here: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/old-2.6-bkcvs.git/.
After bisecting I would try to compile a specific kernel version. How can I switch to a specific commit (with cogito I could do “$> cg-seek v2.6.18-rc3”) but the cogito scritps seems to be buggy, isn’t it ?
I’ve never used cogito. WIth git I would suggest you create a new branch (so that you do not lose the bisection state) and then hard reset to a tag.
# git branch mybranch
# git checkout mybranch
# git reset –hard v2.6.18-rc3
Hi.
Good design, who make it?
Hello!
I think this try.
Thank you for this great guide :)
Last year, I ran into a v4l problem, and, as my first experience with git, I tried bisecting, but I ran into problems (don’t remember what, but it was more of an issue between me and git, than between me and the kernel), and decided just to download a bunch of tarballs and narrow it down. I managed to narrow it down to a 4-number release, so I had figured I’d go back into git and try to narrow it down further, but apparently there are no tags for the 4-number versions, and the commit hash (from the release notes, apparently written by Greg KH) was unknown to git on a clone of Linus’ repo.
So I gave up and just rewrote the app to use v4l2 (which fixed my issue), but I’m nevertheless curious what I really should have done with the additional info I had about the 4-number release… obviously if it happened now, I wouldn’t even bother with the tarballs, but it’d be nice not to have to build another 10+ kernels above and beyond the minimum if I ever did decide to pick up where I left off…