discussion

Did you look at dirsplit(1), part of the mkisofs package?

-Dave

Thanks, dirsplit is just the thing (FWIW, I think it's only in Debian's mkisofs, not upstream).

But its algorithm sucks: Pack items into N buckets, adding each item in the list to the first bucket that it will fit in. Repeat X times (default 500), shuffling the list in between each time. Take the solution with least waste. Ugh. There has to be a better approach than that. So the new question is, what's a better way to do that?

Would this work: Sort the items by size, then starting with the largest, add each item to the first bucket that it will fit in. It might be that there are cases where a group of items really should go in the same bucket for best results (fitting together well and using up all the space), and don't using this simple approach. Would need to runs some tests to see..

--Joey

I implemented something like this for splitting up video source files across DVDs for international shipping. It does both best-fit and first-fit and then uses whichever result requires the smallest number of disks. This seems to work pretty well. It is also capable of splitting files larger than a certain size, since ISO 9660 has a file size limit of 2³¹-2 bytes.

--?BenHutchings

Wouldn't mind a link to the code for that Ben. :-)

--Joey

The algorithm you suggested above is (aiui) the greedy approximation solution to the knapsack problem. That's what I've used. It seems to work well.

-Alok

I once wrote a small Ruby program for this exact reason. It was initially meant for personal use, so the usability/efficiency could probably be improved a lot, but you are free to play around with it. I put it online at dirpack.rb . It uses a solution to the knapsack problem that first sorts the files by size and then uses a first-fit algorithm.

-Bas