Life in the back-channels: a *bsd installation saga. by Robert Bernstein 28 Homestead Avenue, Esmond, RI, 02917 401-231-5502 poobah@ruptured-duck.com 2620 words 4/3/2001 There's more to the story When advocates of supposedly rival operating systems square off in public, it's not hard to find the extreme positions. It is a different story on the back-channels however; absent the glare of public exposure cooperation and sharing can quietly move forward. Partisanship, of its very nature, needs to be noisy. A rational approach suggests that, since nothing is created 'ex nihilo' ("out of nothing"), OpenBSD's historical derivation from NetBSD should not necessarily prove fatal to amicable relations between the two camps. Everything is derived somehow, some way, from something else. But, despite official statements from both sides declaring flamewars out of place and counter-productive, tension and distrust persists between the two projects. This isn't the whole story. Code is shared between the two projects, and the sharing isn't all "one way," which seems to be the sore point with some. Recently I found myself quite by accident playing the middleman and facilitating the transmission of a little bug fix from OpenBSD to NetBSD. It was a small fix, but the episode bears out my contention that most *bsd developers are more interested in writing good code than in prosecuting wounded feelings. SCSI drives, OpenBSD, and me The arrival of a cable modem in our house convinced me I had to learn either ipchains on my Debian Linux box, or the elegant syntax of the ip filter package's rules. Some of the heavyweights on the Linux kernel "team" had taken to refining that kernel's packet filtering code, and the effort to port the ip filter package to Linux had fallen by the wayside. So, like many newly hardwired Internet denizens, I installed OpenBSD 2.4 on an unimpressive bit of hardware scavenged from my collection of retired components. I had my first firewall: a P-166, 32 Megs ram, and an old Maxtor IDE drive. The only high-end part was a new Intel PRO/100 nic; for some reason, on that system my old NE2000 clone could not talk to the cable modem. My Debian workstation sported an ultra-wide SCSI adapter and IBM drive. The adapter, a very nice Mylex BT-950, was purchased before I developed an interest in *bsd systems, and some disappointment set in when I realized that it was not supported by any of the *bsd's I checked. I was hooked, though, on SCSI drives (why and how is another story), so I began to think in terms of SCSI hardware I might someday use in a *bsd system. One reason I had decided on the Mylex was cost; the ubiquitous Adaptecs, although widely supported, seemed overpriced right across the board. I began to think about using an Advansys adapter in my fantasy *bsd computer. Addicts anonymous While I became familiar with OpenBSD I installed and looked at other *bsd operating systems. I am rather the operating system addict: right now in the room are four computers housing six drives and a total of nine operating systems: Win95, OS/2, OpenBSD, NetBSD (2 installs), Debian Linux (3 installs), and, for old times' sake, Novell DOS7. Some of those partitions have been shuffled many times; for instance, the second NetBSD was until recently a second OpenBSD, and before that contained FreeBSD. I settled on NetBSD as the operating system I was going to "live in" for awhile in my new installation. This choice was made on grounds of curiosity, and a subjective sense, gained while experimenting with all of the OS's mentioned above, that I would feel most "at home" with NetBSD. I suspect the low "hype factor" associated with NetBSD appealed to my conservative side. Also, NetBSD's package collection was very attractive. If this seems all very fuzzy and unsatisfactory, it is. There's no accounting for taste, and I am certainly not going to enter into the advocacy wars with arguments claiming NetBSD is "better" than the others. For one thing, I don't know enough to make those arguments, or make them well at any rate. Into the breach The day had come. Finally I had my new Advansys card and a new Seagate SCSI drive. Computer cases were opened, drives and boards were shuffled around, and I set out to install NetBSD 1.5 on the new drive. The boot floppies were created and an ftp install was begun. My excitement dimmed as this filled up the screen: adw0: DMA Error. Reseting bus adw0: DMA Error. Reseting bus adw0: DMA Error. Reseting bus adw0: DMA Error. Reseting bus adw0: DMA Error. Reseting bus adw0: DMA Error. Reseting bus adw0: DMA Error. Reseting bus adw0: DMA Error. Reseting bus adw0: DMA Error. Reseting bus adw0: DMA Error. Reseting bus The first distress signal went out to Usenet. I posted before doing much research on the problem. This is not good netiquette but sometimes I just want to say "Ouch" out loud. The simple expedient of pasting the error message into google had produced a recent hit on the NetBSD current-users list, so at least I would not come across as a total lamer. And, the NetBSD misc newsgroup is not so busy that I felt much more than a twinge of guilt for this hasty post. The error message itself made for a good Subject header (the misspelling was in the original): -- From: Bob Bernstein Subject: Advansys 'DMA Error. Reseting bus' Newsgroups: comp.unix.bsd.netbsd.misc Date: 2001-02-08 15:51:32 PST I am as I write this watching these error messages stream by as I install NetBSD on a new Seagate scsi/Advansys combination. I note in the mailing list archives that I am not the first person to experience this, but I can't find any _answers_ to the problem in those archives. -- I would soon learn that the reason I found no answer in the mailing list archives was that no answer had been found. This wasn't a case of folks failing to post back to a list the resolution of an issue with the misguided intention of not contributing to a busy list's traffic. (Is there anything more irksome?) In any event, the NetBSD misc newsgroup was not forthcoming but a search through the NetBSD bug collection revealed a recent PR (problem report) had been filed on this very glitch. I reached out to the author of that report: -- Date: Thu, 8 Feb 2001 19:12:20 -0500 From: Bob Bernstein To: Herb Peyerl Subject: Advansys errors Hi, I am in the process of replicating your experience, described in your recent PR, of watching 'DMA Error. reseting' stream up the monitor screen as I try to install 1.5. It's a new Advansys 3940UW. I'm getting the errors while ftping the distribution sets; I haven't even gotten to untarring them. Have you had luck with this thing? I am new to NetBSD culture, so if there's anything I can do to further call attention to this, please let me know also. -- In his reply Herb talked me into filing my first NetBSD PR, giving my version of our mutual problem. He had not made any more progress than I in solving the thing. Along the way I ran into a fairly common problem: how do I capture the dmesg output for use in a PR when I have booted from a floppy? I knew I had seen the answer to this on a FAQ somewhere. It dawned on me that, since I have spent much more time hanging out in OpenBSD culture than in NetBSD, the answer might be in the OpenBSD FAQ, and sure enough, there it was: http://www.openbsd.org/faq/faq14.html#14.7 It is simplicity itself; keep an ffs formatted floppy handy, and after dropping down to the shell mount it on /mnt and do something like: # cat /kern/msgbuf > /tmp/my-dmesg # cp /tmp/my-dmesg /mnt Back to OpenBSD I had taken things as far as I could. The two PR's, mine and Herb's, were in. The alarm had been sounded on comp.unix.bsd.netbsd.misc, and no one seemed to have a clue. A couple more emails passed back and forth between Herb and me, and I got some clues from him about ways to keep track of the progress, or lack thereof, made in fixing adw in NetBSD current. All this new SCSI hardware was sitting idle though, and I was getting itchy. The cure was a fresh installation of OpenBSD, and I set to that task. Very quickly I ran into trouble, since the driver for my Advansys card was not on the boot floppy image specified in the docs. I posted this problem to the OpenBSD misc mailing list, and the next morning I heard from Ken Westerback, the OpenBSD developer who maintains the Advansys driver. The driver had been moved to another boot floppy (OpenBSD provides three different boot floppy images), and with that I easily completed an ftp install of OpenBSD 2.8 on my spiffy new SCSI hardware. But my joy was short-lived. When I started my first setup task, which is always to install the tcsh shell, I managed a rare feat: to completely lock up an OpenBSD system. Again, the flare went up on misc@openbsd.org, in the form of another message in the "which boot floppy" thread: -- Date: Fri, 9 Feb 2001 11:34:30 -0500 From: Bob Bernstein To: Kenneth R Westerback Cc: misc@openbsd.org Subject: Re: adw driver on 2.8 install floppy? On Fri, Feb 09, 2001 at 09:30:17AM -0500, Kenneth R Westerback wrote: > The adw driver did move from floppy28.fs to floppyB28.fs for 2.8. In > fact I thought it moved there for 2.7, but obviously not. :). Found it. > ** NOTE ** There is a bug that was fixed just after 2.8 that can cause > the adw driver to crash the system with certain chipsets (Intel BX of > some vintage). Your dmesg indicates you don't have that chip, but I > recommend you update the adw drivers to -current as soon as possible as > I have also fixed some other things in there recently. Well, I have the crash: adw0: DMA Error. Resetting bus. I used lynx to grab my first package, which is always tcsh, and when lynx says "save to disk" after downloading the archive, I pressed 'enter' and went into never-never land, locked up tight as a drum. I wonder: is it possible to aim the install floppy at a 'current' directory to grab the install sets? Would that get me the newer adw? -- The lesson here is (duh): if a developer suggests updating a given piece of code, fail not thereof at your peril! In my haste and inexperience I overlooked this hint; installing OpenBSD via ftp from the current tree was something I had never tried. It was simplicity itself however. I placed the installation sets from snapshots in a directory on my old (2.7) OpenBSD system, and pointed the new ftp install at that location. Finally I was off and running. And, in his reply, Ken drops the hint that actually gets this story off the ground: -- Date: Fri, 9 Feb 2001 12:10:59 -0500 From: Kenneth R Westerback To: Bob Bernstein Subject: Re: adw driver on 2.8 install floppy? Yep. DMA Error is the fault. It's a one character fix (i+i should be i+1 in the scatter gather setup logic) that took me six months to find. If you point the floppy at a 'current' directory you will indeed get the latest and greatest installed. The bsd in your -current directory will have been compiled with the latest fixes. You are lucky you can get that far! Usually it crashes during installation. At least that's what happened on the machine I finally found that 'reliably' failed so I could track it down. Hmm, on reading your message again, it seems you may not be getting that far after all. Your best bet might be to get a snapshot floppyB28.fs from the web. Assuming you have a working machine of some kind. Or I can email you a -current one (bleeding edge!). Let me know how it goes. adw was my first driver update and holds a special place in my heart! .... Ken -- Date: Fri, 9 Feb 2001 15:52:52 -0500 From: Bob Bernstein To: Kenneth R Westerback Subject: Re: adw driver on 2.8 install floppy? On Fri, Feb 09, 2001 at 12:10:59PM -0500, Kenneth R Westerback wrote: > Yep. DMA Error is the fault. It's a one character fix (i+i should be > i+1 in the scatter gather setup logic) that took me six months to > find. Ach!!! Talk about recipes for madness!! Congrats on staying with it for six months! Btw, adw in the latest NetBSD is broken, and throws the same error. I wonder if they have that same glitch? > If you point the floppy at a 'current' directory you will indeed get > the latest and greatest installed. The bsd in your -current directory > will have been compiled with the latest fixes. I'm using everything: floppyB28.fs, bsd and distribution sets, from "snapshots". So I guess that's "current;" I'm a little fuzzy on the terminology here. > Your best bet might be to get a snapshot floppyB28.fs from the > web. That's what worked. -- Date: Fri, 9 Feb 2001 18:00:57 -0500 From: Kenneth R Westerback To: Bob Bernstein Subject: Re: adw driver on 2.8 install floppy? Good to hear. Yeah, NetBSD still has the bug since the guy I was working with (who actually wrote all the code) at NetBSD has dropped off the face of the earth and I don't have another contact. I haven't submitted a bug report yet, just laziness on my part. .... Ken -- The corner is turned I still had a NetBSD install in the back of my mind, so I pondered how I might get Ken to communicate his fix of adw to the NetBSD camp. I had tried grepping for the 'i+i' typo, but I couldn't find it. I decided to consult Herb on these questions. Some of you at this point, those more familiar than I with how C code should be written, already know what's coming! Herb got right back to me: -- To: Bob Bernstein Subject: Re: Advansys errors From: Herb Peyerl Date: Sun, 11 Feb 2001 16:22:40 -0700 > Ken said he had found an 'i+i' in his code that should have been an > 'i+1'. (Eeek.) I've grepped the adw code for i+i and didn't find it. Ah yes, there it is: /sys/dev/ic/adw.c: if (--sg_elem_cnt == 0) { /* last entry, get out */ sg_block->sg_cnt = i + i; sg_block->sg_ptr = NULL; /* next link = NULL */ return; } -- Of course! Properly formatted code would never contain an 'i+i', but I wasn't smart enough to think that through. It was, as the old saying goes, all down hill from here. Herb tested his fix of adw.c and made the resultant kernel available to me also. Finally my new Seagate had a working install of NetBSD current on it. Victory! Herb closed our two PR's (12114 and 12158), with a rather nice note, or so I thought (from the cvs log for adw.c): "Fix for kern/12114 and kern/12158 Advansys DMA errors. Reported by Bob Bernstein who heard from Kenneth Westerback that this might be the problem. Tested by HP." And, I heard from Ken, "I saw the fix go in, and the credit. Thanks for passing that along." Clearly, there are developers on both sides of the Net/Open *bsd equation who have not taken to the streets waving pitchforks and banners, so to speak, seeking to overturn the other's ramparts. My guess: most of those who participate in the development of these systems have placed their allegiance first and foremost in writing good code for thoughtfully designed, effective systems. You find them on the back-channels.