Archive for the ‘Linux kernel’ Category

Using git-bisect to find buggy kernel patches

Saturday, January 21st, 2006

A keen Gentoo user (Ronald) recently came onto IRC, with a strange kernel problem: 2.6.15 only using one of his CPU’s although it sees both, 2.6.14 works fine.

He was keen to investigate the problem himself so I thought now would be a good time to try the git bisect feature for the first time. There were approximately 5500 patches merged between 2.6.14 and 2.6.15, and this method found the offending patch in only 13 reboots (OK - still a high number, I said this guy was keen though!).

This feature is very effective and very easy to use, so I’m going to demonstrate it here. Linus also wrote a nice HOWTO on the topic.

You first have to clone the Linux kernel git repository. This is the same concept checking out a repository from CVS.

# git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-git
# cd linux-git

This directory looks like an ordinary kernel source distribution as you might expect in /usr/src/linux. It also has a hidden .git subdirectory which is where git stores its magical data. Tell git that we want to find a buggy patch through bisection:

# git bisect start

Next, tell git which kernel was the last known-working kernel, and which kernel is known to be not working. To identify these kernels, you can either use the long hexadecimal commit numbers, or you can abbreviate those numbers, or you can refer to tags directly. Linus tags every release with the version number, so the next step is as simple as:

# git bisect bad v2.6.15
# git bisect good v2.6.14

git now runs off and locates the commit exactly halfway between 2.6.14 and 2.6.15. It then “checks out” this tree, so the kernel infront of you is effectively a snapshot from halfway inbetween 2.6.14 and 2.6.15.

Bisecting: 2705 revisions left to test after this
[dd0314f7bb407bc4bdb3ea769b9c8a3a5d39ffd7] fbcon: Initialize new driver when old driver is released

Build that kernel in the normal way, and reboot into it. It works, so we know the bug was introduced after this point. We inform git of this:

# git bisect good

git now discards the whole first half of commits between 2.6.14 and 2.6.15 and bisects the remaining half (the changes after the point we just tested, but before 2.6.15). This is just a simple binary search. git now presents us with a new kernel snapshot (in this case, 3 quarters of the way between 2.6.14 and 2.6.15) and we have to test this.

A couple of kernels later, the search gives us a kernel which exhibits the bug. Telling git about this isn’t any harder:

# git bisect bad

The search continues, with the user telling git if the kernel was “good” or “bad” each time, and several reboots later we end up with the exact patch that introduced the bug:

# git bisect bad
cd8e2b48daee891011a4f21e2c62b210d24dcc9e is first bad commit
diff-tree cd8e2b48daee891011a4f21e2c62b210d24dcc9e (from d2149b542382bfc206cb28485108f6470c979566)
Author: Venkatesh Pallipadi <venkatesh .pallipadi@intel.com>
Date:   Fri Oct 21 19:22:00 2005 -0400
    [ACPI] fix 2.6.13 boot hang regression on HT box w/ broken BIOS

Ronald filed kernel bug 5930 about this. Usually this stuff is a nightmare to debug, and even though this did require many reboots to locate, it’s definately a step in the right direction. The number of bisections you need to do is obviously less if you have a smaller range to test (e.g. if he’d known that 2.6.15-rc1 was OK and 2.6.15-rc4 was bad, some time would have been saved).

Alauda driver merged

Thursday, January 19th, 2006

Just a quick entry to point out that my Alauda driver project (a driver for certain XD and SmartMedia card readers) has been merged upstream into Linux 2.6.16-rc1, and will be included and supported in all future releases. That’s one driver out of the way!

Recent hacking on VIA 82Cxxx IDE driver

Thursday, December 15th, 2005

I’ve recently been doing some hacking on the Linux VIA IDE driver. The driver now supports multiple controllers on the same system (if such a situation could ever exist) and support has been added for the VT6410 and VT8251 chipsets.

All of this is included in Linux 2.6.15.

A side effect of this work is that we had to drop the /proc/ide/via file, a dynamic text file which provided some advanced information and stats about the VIA IDE hardware present on the system, and the attached disks. This kind of querying should be done in userspace anyway, so I have produced viaideinfo to do the same thing. viaideinfo has been tested by quite a few people and is now at version 0.3. It is in portage as sys-block/viaideinfo.

Alauda driver is complete

Tuesday, November 8th, 2005

The driver

I completed development of the Alauda driver in September and submitted the driver for inclusion to the Linux kernel.

The driver duplicates some code (checksum, media ID table) which is also present in other drivers, and it looks like we want to figure out a good way to share this code before including my work, which is a fair point - we don’t want to duplicate this yet again.

For now I’m publishing my driver as a standalone patch which people can use until we figure out the real integration details. These devices seem to be more common than I originally thought. Patch available here (against Linux 2.6.14).

Juice Box

In true open-source style, a group of hackers have taken my work and used it for something I didn’t design for: hacking the Juice Box - a portable media player based on ucLinux.

From what I gather, these devices boot from a small amount of NAND flash. To customise the device to a decent level, you need to replace this flash with your own.

Fortunately, xD media is basically NAND flash with a slightly different pin configuration, so they have done crazy things such as solder a pre-programmed XD card to the PCB:

However, you can’t just pre-program these XD cards on any old reader/writer. You need to use a device which gives you access to the physical block layout of the media, so that you can write to block 0 (amongst other things). Almost all XD reader/writer devices on the market handle physical block translation in hardware, and only provide a logical block interface to the host operating system, which does not satisfy the needs of these hackers.

The Alauda is probably the most common device that provides physical access, making writing a driver considerably harder, but allowing you to hack the media in ways such as this. I may even donate my spare Alauda device to their project.

Alauda driver now reads all XD

Monday, September 5th, 2005

Quick update on the Alauda driver status:

Figured out the rest of the block addressing, so it can now support more card sizes. It also should detect the media size automatically and work “out of the box”, at least it does with the two XD cards I have here.

XD media reading is now pretty much complete, except for a few performance improvements which will be made at a later date. Next up I’ll be getting my hands on some SmartMedia and implementing read support for that.

Code is available from SourceForge CVS.

Fuji DPC-R1 is an Alauda!

Tuesday, August 9th, 2005

I have been making some fast progress with the Alauda driver project.

I discovered that the Fuji DPC-R1 is also an Alauda device, so my driver should result in this device being supported by Linux, as well as the Olympus MAUSB-10 which I have in my posession.

I’ve gone over several logs of sniffed data and documented what I think it’s doing on the new sourceforge project page: control commands and bulk commands.

The first thing I want to achieve is the reading of my XD media cards. The Alauda only provides raw access to the media, so I basically have to interpret XD in software. This is made difficult because XD is closed-spec and it doesn’t look like anybody has done it in software before (well, not open-source!), but fortunately it is similar to SmartMedia (the SDDR09 Linux driver handles “raw SM” in software, and this is proving useful to look at). I’ve published what I think might be something vaguely like the XD Media specification.

Once I’ve figured out a few more details about XD media, I should be able to get reading working relatively easily. I’ve got the driver foundations in place already.

I’m hoping that I’ll hear from some other MAUSB-10/DPC-R1 users soon to get additional testing when I actually have some code which does something useful.

Alauda MAUSB-10

Saturday, August 6th, 2005

Got my hands on an Olympus MAUSB-10:

It’s a USB media card reader (2 slots, SmartMedia and XD-Media). It uses a vendor-specific interface and protocol, and is currently unsupported by Linux.

I’m going to be slowly developing a driver for this device. The Windows driver shows that the driver is actually for an “Alauda Enumerator” chip, maybe manufactured by RATOC? I’m going to be a copycat and call my driver Alauda.

I’ve fired off an email to Olympus requesting technical documentation for the device, but I’m doubtful that I’ll get anything - I’ll probably have to reverse engineer it from scratch.

Some techy info, the device seems to use a combined Control/Bulk transport, where bulk is used for data transfer and control is used for everything else (get media status, etc). Commands seem to be transmitted via bulk (after control setup) and hopefully they are something standardised e.g. SCSI/ATA, but I haven’t had time to investigate this just yet.

If anyone else owns one of these and would be interested in development/testing, please email me.

I will upload sniffed logs from the Windows driver and an initial protocol analysis sometime soon.

GWN Kernel Feedback

Wednesday, July 20th, 2005

In this week’s Gentoo newsletter we included a request for feedback on possibly stopping development of gentoo-sources-2.4 and removing it from the tree.

John M has already written about the new plan of action in response to this feedback, but as we have recieved a fair quantity of interesting feedback, I thought it might be worth sharing some of it.

Most of the mails are from people who apparently misread the article, thinking that we are dropping Linux 2.4 completely. People were very keen to remind us of certain software and hardware which doesn’t work on 2.6.

One user thought he was doomed to 2.4 simply because his hardware is unsupported in any kernel, however Promise do provide 2.4-only drivers on their websites. Fortunately, all that is needed is a small patch to get this working on recent 2.6 kernels, so Linux should support Promise TX4200 SATA controllers very soon.

Some users reported that they are stuck to 2.4 because of their wireless cards - support is not available in 2.6, and the vendors insist on producing drivers for 2.4 only.

We suspect that the other alien hardware incompatibilities that were reported to us are already supported in 2.6, but we have yet to recieve the information to confirm this.

A few users reminded us that our OpenAFS ebuilds are so outdated that they don’t run on 2.6 at all. Fortunately, Seemant assures us that he’s been fuelling his production line and a new OpenAFS package maintainer should be ready for public consumption sometime soon.

We recieved a couple of more interesting mails, for example this guy, who ‘gets’ how Gentoo works:

I migrated our desktops and servers from 2.4 to 2.6 a number of months ago, so I encourage discontinuation of 2.4. The more people on 2.6 the easier it will be to maintain.

Notice in the above email he referred to “our desktops“. Many of the emails were written in this tone - from a corporate environment. People saying “we use”, “we migrated”, “…a big concern for us”, etc. Doesn’t anyone use Gentoo just because its fun anymore? :)

Another user pointed out that Gentoo rocks because it’s very easy to still be using a 2.4 kernel without also having to run packages that are a year out of date.

Thats about all so far. Thanks for all the feedback, we’ll probably get something in next weeks newsletter to conclude.

Linux 2.6.12

Sunday, June 19th, 2005

Since there’s the usual chorus of “whats new?” I thought it might be worth posting some things which I’ve observed since 2.6.11:

  • My driver is finally included!
  • Intel HD audio driver
  • ALSA now uses dmix (software mixing) by default even if you have not configured it
  • ALPS touchpad (included on many laptops) driver rework
  • Multipath device mapper
  • Improved SATA support
  • The firewire subsystem finally saw some action
  • Plenty of new hardware support, lots of things that I missed
  • Lots of bug fixes and internal improvements

The first gentoo-sources-2.6.12 release has been added to the tree and features the new inotify (0.23-13) as well as the new fbsplash (0.9.2-r3). Since we moved our patches into Gentoo’s shiny new Subversion server the website generation is broken but this will be updated soon. Also note that your custom udev rules might be broken until you upgrade to udev-058.

The definitive guide to CD writing on recent 2.6 kernels

Sunday, April 17th, 2005

Despite the fact it happened a few months ago, I still see people claiming that it is impossible to write CD’s with recent 2.6 kernels due to the internal changes. This is not true!

The first thing to check is that you aren’t using SCSI emulation. You don’t need this in Linux 2.6 - its ugly, and apparently doesn’t work. Disable SCSI emulation and use the new ATAPI features to burn “directly” (no extra configuration options needed other than general ATAPI cdrom support).

Next up, check you are running the latest version of cdrtools/cdrecord, even if this means going into the testing/unstable tree. The latest at the moment is cdrtools-2.01.01_alpha01-r1. The same applies for any other cd recording software you might be using.

It’s also important that you are running the latest stable kernel. This is currently 2.6.11. If you aren’t running this, then you are in no position to complain that things don’t work or file bugs.

A few months ago, the kernel code changed to restrict the commands which can be sent to CD writers. The restriction only allows commands that are related to CD reading and writing. The commands are listed in a table. At first, the table was too small, but thanks to user feedback I helped expand it to cover all commands needed for CD writing.

It seems that many users keep advocating a patch which completely bypasses the command table - and this patch is still included in some patchsets! This is no longer needed! If it does make a difference, then its a bug in either your CD writing software or the kernel command table is missing a valid command entry. If this is the case, send me an email, and I’ll fix things up with minimum effort involved. I’ve done this a few times before and I’m happy to do it again.

Another kernel change made at the same point in time resulted in the ability to write CD’s being restricted to users with write access to the CDROM device. That makes sense, right? (Previously you just needed read access.)

So the first thing to do is check that you have write access to your CDROM node. Assuming your CDROM drive is /dev/hdc, run the following:

# [ -w /dev/hdc ] && echo “OK” || echo “Fix up your permissions”

If you see a message telling you to fix up your permissions, the usual procedure is to use ls to see which group owns the node (usually cdrw or cdrom), add yourself to that group, and then try again. This requires that the group has access to write to the node too.

Moving on. Some software used to advise that you set the suid bit on your cdrecord binary. This means that cdrecord will always run with root priveleges, even if you execute it from a standard user account. Some kernel memory management changes broke cdrecord’s ability to run as root for a while. Even though it’s fixed now, it’s worth turning this off for the security implications alone:

# chmod -s /usr/bin/cdrecord

Note that xcdroast does weird things with suid bits too. For the purpose of testing, I suggest you keep away from xcdroast.

In terms of software usage, people seem to be historically used to running commands like cdrecord dev=ATAPI:0,0,0. An alternative method is cdrecord dev=/dev/hdc. There’s been an eternal debate as to which way should be encouraged - the cdrecord author doesn’t like the dev=/dev/hdc method, as you can probably gather from the warnings that appear. Right now, all you want to do is write CD’s, so no matter what you think, you need to use the dev=/dev/hdc method. Also, do avoid the dev=ATA:x,y,z notation too.

Another possible problem is that your CD writing software is opening up the CD node as read-only. The kernel will reject any write commands sent to it. The software needs to be modified to open the CD node as read-write. I fixed up some of the software such as dvd+rw-tools and cdrwtool a while back - but if there is any more problematic software, please let me know. (This is why its essential that you are running the latest versions of the software in question)

Finally, burn as your standard user account, not as root.

Hopefully this clears some things up. If you do experience problems after following the above precisely, please file a bug.