07

Lots of interesting stuff today..

Saw a tiny faun in the woods, less than 10 feet away.
Got lost in the woods w/o water for 45 minutes; managed to walk down some trails 2 and 3 times, which is amazing for me, but I was pretty dehydrated and out of it.
Got my first ever speeding ticket on the way over to the dilab. :-( New switch from HP works great though.
Spent far too many hours reproducing and then boiling down a minimal testcase for one of the strangest perl bugs I've ever seen.

Dear lazyweb --

Is there any way to make a html list be formatted like ls formats a list of filenames? That is, in columns where the number of columns is determined by the width the list has to display in, and the width of the elements, and the number of elements.

I coded this same display for debconf's readline frontend a while back and it was not entirely trivial, so I doubt that CSS can support it, but would be happy to be proven wrong.

discussion

Hmm, beginning to think I might be capable of designing a nice-looking web page after all. I know that most people characterise my web pages as "looks like the 90's"; that's because my priorities in making a web page are not very related to many other people's priorities in making a web page. (Although this is less true than it was in the early 00's due to some shifts on both sides.)

However, I'm pretty happy with ikiwiki's new RecentChanges page from both my perspective, and from what I imagine is the perspective of most users of the web.

I'm also happy that ikiwiki let me turn its old RecentChanges into this new page w/o needing to change any code. Means that I can leave the rest of the web-design up to the non-coders. :-)

I'm looking for a simple tool that, given a set of files, can split it into subsets that are smaller than a given size. It doesn't need to split individual files, just pack them into the subsets in an efficient way. It could actually move files into subdirs to create the subsets, or could just output a list of what goes where.

Think archiving files -- lots of files -- to DVD. Doing this manually gets old, fast.

I can't seem to find this tool. Am I not searching for the right terms to find it, or does it need to be written (or dug out of someone's ~/bin/) and added to moreutils?

Update:

Lots of feedback on this one. This problem is called the knapsack problem (which I should have known). Specifically, the 0-1 varient. It's NP-hard.

The best alogorithm I've seen suggested is to go through the files, largest first, and put them into the first volume that will hold them. This is implemented in packcd.

Debian's mkisofs package has a dirsplit that uses a much worse approach, randomising the list, putting each file into the first volume that will hold them, and iterating 500 times to find the least wasteful packing. Yugh. On the other hand, it does know various things about how much space filenames will take on the CD.

sync2cd can do basic splitting, although not smart ordering, as well as lots of other stuff.

gafitter looks intriguing. It uses Genetic Algorithms to fit files into volumes of a given size. And it works as a filter.

I still think it would be nice to have a generic unix tool that could take du input and output the list of files and which set to put them in. There are, after all, applications for this beyond packing CDs..

Update 2:

I'm currently using gafitter, although oddly not using its GA mode, but its simple packing mode (because I want to keep subdirs on the same dvd). The script I use to split out a dvd worth of stuff with gafitter and burn it with growisofs can be downloaded from my repo here.

discussion

Here's an article on linux.com about moreutils. Whee!

Linux has the benefit of a steady barrage of new applications, utilities, software suites and tools -- as any casual perusal of freshmeat and SourceForge shows. One new bundle of software, the moreutils package by Debian developer Joey Hess, stands out from the rest. Moreutils is a collection of small, general-purpose tools that Hess says "nobody thought to write 30 years ago."

Back from the beach. Rather than writing any kind of summary, here's the Ocracode for this trip:

OBX1.0 P1/7 L7 SA5s++b++c+++/A17d++b-c-- U4(setup alone,Kai,Daddy swims)
T6f2-b0 R1w Bn-b++m++ F++u++ SC++s++g1 H+++f2i5 V+++ E+++r++

For comparison, retrospectively generated code for last year's trip:

OBX1.0 P6 L4? SB10dc++b++ U6 T2 R4tsw Bn--m--b-- F- SC-s++++g5
H--i0 V-  E++r++

Back from seeing Mountain Stage record a radio show at the Paramount. Dale Jett, Tim O'Brien, the attack mandalin and bass of the Yonder Mountain String Band, Odetta (who has a commanding presence onstage even before she speaks), and Ralph Stanley & The Clinch Mountain Boys. Great fun watching Larry Groce croud them all onstage at the end and arrange an song on the spur of the moment.

First time I've been to a radio show. Got me thinking to one of the first times I remember being struck by something on the radio, when we were driving up to the tobacco warehouse one night in Abington in Silas's truck, with its noxious chaw spit can, and I heard this ethereal mountian voice coming out of the radio, sounded 50 years ago and at the same time so immediate. Could have been Ralph Stanley, come to think..

After the show, I humped a dorm-style cube fridge half an hour downhill thru the woods in the dark. But that's a different story..

I have an important decision to make about moreutils, my collection of new unix tools. The question is whether to stand firm behind the idea of only adding tools to moreutils that are truely unique and not available in similar form in other packages, or whether to collect good tools to fill in missing gaps in the standard unix toolset, even if they are similar to already existing, but possibly hard to find tools.

So far I've had an easy time deciding to reject some tools like add, todist and tostats, which are somewhat special purpose and which turn out to already be mostly implemented in the numutils package. And I'm also fairly comfortable with Lars's and my decision to not include mime, which is similar to File::MimeInfo's mimetype program, if only because implementing that requires a lot of code or a long dependency chain.

The decision is harder for things like shuffle/unsort, which both fill in the gap of a file randomisation tool. There already are packages in Debian (bogosort and randomise-lines) that provide this functionality. Another example is srename, which is similar to the rename program hidden in perl, which is insanely useful and rather underused.

It's also tricky since I could well be missing existing tools that overlap with moreutils. For example I just learned of the renameutils package, which contains a qmv. That is close to the same thing as vidir, although limited to filename removals (no deletions), and rather more complex, with an interactive command line mode. I've decided that it's worth keeping vidir despite this, if only for the delation support and the possible broader scope later, but it does point to a more general issue.

I've already concluded earlier that:

Maybe the problem isn't that no-one is writing them, or that the unix toolspace is covered except for specialised tools, but that the most basic tools fall through the cracks and are never noticed by people who could benefit from them.

One way that tools fall between the cracks, after all, is by being spread amoung lots of little obscure packages like File::MimeInfo, bogosort, randomise-lines, odd corners of perl, mmv, renameutils, etc. You might know about some of these packages, but you probably didn't know about all of them -- I know I didn't. I suspect that the authors of some of them didn't know about others and duplicated work.

A single package that collected good generic, consistent, and simple implementations of tools, even if some of them were not unique to it, could benefit from getting more attention than lots of small packages like those, and both help users find out about these tools as well as focus development energy on them to make them better.

So which is more important, focusing on collecting and writing unique tools or promoting slightly less unique tools?

discussion

This might just be artifacts of staring at a screen too much, but:

I've seen more than one web page with a background that had some bit that looked like a hair or speck of dirt sitting on the screen. And reflexively tried to wipe it off the screen.
My old laptop had dead pixels that I would, every now and then, think were dust specks on the screen and try to remove.
Just noticed a strange rounded and transparent little button pop up on a web page, with some odd icon inside, so I clicked on it. It seemed to react to the mouse but nothing else happened. It turned out to be a water droplet, magnifying what was behind it in confusing ways. (They don't call it "Aqua" for nothing..)
Must upgrade my PowerManga fighter to kill the attacking moths. Die die die!

Hmm, it may be time to write my second ever X program. Exporting random "dead" pixels to an open $DISPLAY would be so Evil.

I am not happy with dh_pthon in the current state that it was (NMUed into) debhelper. It is preventing me from feeling good about or generally maintaining debhelper. This is a plan for fixing that.

Transform dh_python into a perl module or program by some other name and possibly different interface, and put this in some other package(s).
Modify dh_pysupport and dh_pycentral to call above so they take over dh_python's part of the work.
Reduce dh_python to a no-op warning script and put on deprecation track for removal.
Add a regression test to debhelper that fails if any debhelper script exceeds 150 lines, excluding POD docs. (Done in svn)
If at any point the python maintainers find an approach that does not invole multiple competing implementations and that is either simple or has the complexity encapsulated into some other program, I can add back a dh_python.

Until then the current mess will be Not My Problem.
Be much, much, much more cautious accepting niche programs into debhelper in the future. :-(

Possible technical complications:

dh_pthon supports params for passing additional module directories. dh_pysupport does too, but it seems that dh_pycentral does not. Need to check if this means that noone using dh_pycentral can use it with non-standard directories, so none of them pass directories to dh_python.
dh_python has -V and -X flags that affect its behavior and the other two programs would not have access to this information.
- I doubt that -X is used widely. Grep.
- -V is a redundant interface, since it does something approximating what debian/pyversons does. So things should be able to switch to the other interface, except in edge cases (although IMHO it's not an appropriate interface for a debhelper program), and for the edge cases, -V could be added to the other two programs.
It's possible that someone might set dh_python's -n flag but not also set it on their call to dh_pycentral or dh_pysupport. Can't imagine why though.
Not all packages use the new python policy yet. For example, debconf doesn't. I'm not sure if that's a bug or not. One alternative would be reverting dh_python to the pre-NMU version (plus a small shim to make it a no-op if it detects the new policy), to avoid breaking such packages until they transition.

Wow, just spent 4 hours integrating other people's patches and plugins into ikiwiki. That's what they coded up while I was asleep last night.

Now I'm just waiting for all of them to go to sleep so I can code up the ikiwiki feature I was planning to work on today: RSS aggreagation, so it can be used as a Planet.

Anyway, as far as we know, ikiwiki is now 100% utf-8 supporting right down to the filenames in svn. And it can do tag clouds. And, perhaps best of all, I don't even need to write blog entries anymore, since ikiwiki now supports polygen, and can just generate text for me.

Well, I did it: I shoehorned feed aggregation support into an ikiwiki plugin. So now it can run as a cron job, pulling in updated from feeds, turning them into wiki pages, and saving the pages to disk, then rebuilding the wiki.

Turning a wiki that grew blog support into a Planet engine has some interesting consequences. One of which is that the aggregated posts can be edited in the wiki unless locked, but more interesting is that the posts are first-class wiki pages that can be linked to from elsewhere in the wiki, and that can stick around even after they've fallen off the Planet's front page (although I do need to code up the support for eventually expiring them).

Another interesting thing is how the feeds are configured: By directives embedded in the wiki pages themselves. So no need for config files, and the wiki page that configures the feeds builds into the web page that lists and shows the status of the feeds.

The nicest thing about it though, is how little code I had to write, and how targeted it was. 300 lines exactly for it all. No need to deal with RSS generation, page formatting, full text search, any of that, it was all already there and works with the new source of pages like it would with any other pages.

Very happy with this. If you want to check it out, I've imported the data from Planet Debian upstream into updo.kitenet.net. Now if you find a new blog for a Debian upstream author, you can just edit the feeds page and add it yourself.

PS: No, I don't know why Jeff Waugh is flooding it, but it's probably a minor bug in the page creation date parsing. Too late to check. (Update: Bug in the atom parser, bug filed w/patch.)

PPS: I know about the problem with Lennart Poettering's blog encoding; a patch is in the BTS.

←	Jul 2006					→
S	M	T	W	T	F	S
						1
2 some day	3	4 ls-like formatting for html list web design	5	6	7	8
9	10 file set split utility	11	12 moreutils article	13	14	15
16	17	18	19	20	21	22 beach trip
23	24 moreutils musings mtn stage	25	26 trompe-l'oeil	27	28	29 ikiwiki ramp-up proposed transition plan for removal of dh python from debhelper
30 ikiwiki as a planet	31