I'm looking for a simple tool that, given a set of files, can split it into subsets that are smaller than a given size. It doesn't need to split individual files, just pack them into the subsets in an efficient way. It could actually move files into subdirs to create the subsets, or could just output a list of what goes where.

Think archiving files -- lots of files -- to DVD. Doing this manually gets old, fast.

I can't seem to find this tool. Am I not searching for the right terms to find it, or does it need to be written (or dug out of someone's ~/bin/) and added to moreutils?


Update:

Lots of feedback on this one. This problem is called the knapsack problem (which I should have known). Specifically, the 0-1 varient. It's NP-hard.

The best alogorithm I've seen suggested is to go through the files, largest first, and put them into the first volume that will hold them. This is implemented in packcd.

Debian's mkisofs package has a dirsplit that uses a much worse approach, randomising the list, putting each file into the first volume that will hold them, and iterating 500 times to find the least wasteful packing. Yugh. On the other hand, it does know various things about how much space filenames will take on the CD.

sync2cd can do basic splitting, although not smart ordering, as well as lots of other stuff.

gafitter looks intriguing. It uses Genetic Algorithms to fit files into volumes of a given size. And it works as a filter.

I still think it would be nice to have a generic unix tool that could take du input and output the list of files and which set to put them in. There are, after all, applications for this beyond packing CDs..

Update 2:

I'm currently using gafitter, although oddly not using its GA mode, but its simple packing mode (because I want to keep subdirs on the same dvd). The script I use to split out a dvd worth of stuff with gafitter and burn it with growisofs can be downloaded from my repo here.

discussion