upstream git repositories

Daniel Pocock posted The multiple repository conundrum in Linux packaging. While a generally good and useful post, which upstream developers will find helpful to understand how Debian packages their software, it contains this statement:

If it is the first download, the maintainer creates a new git repository. If it has been packaged before, he clones the repository. The important point here is that this is not the upstream repository, it is an independent repository for Debian packaging.

The only thing important about that point is that it highlights an unnecessary disconnect between the Debian developer and upstream development. One which upstream will surely find annoying and should certainly not be bothered with.

There is absolutely no technical reason to not use the upstream git repository as the basis for the git repository used in Debian packaging. I would never package software maintained in a git repository upstream and not do so.

The details are as follows:

  • For historical reasons that are continuingly vanishing in importance, Debian fetishises the tarballs produced by upstream. While upstreams increasingly consider them an unimportant distraction, Debian insists in hoarding and rolling around in on its nest of gleaming pristine tarballs.

    I wrote pristine-tar to facilitate this behavior, while also pointing fun at it, and perhaps introducing a weak spot with which to eventually slay this particular dragon. It is widely used within Debian.

    .. Anyway, the point is that it's no problem to import upstream's tarball into a clone if their git repository. It's fine if that tarball includes files not present in their git repository. Indeed, upstream can do this at release time if they like. Or Debian developers can do it and push a small quantity of data back to upstream in a branch.

  • Sometimes tagged releases in upstream git repositories differ from the files in their released tarballs. This is actually, in my experience, less due to autotools generated files, and more due to manual and imperfect release processes, human error, etc. (Arguably, autotools are a form of human error.)

    When this happens, and the Debian developer is tracking upstream git, they can quite easily modify their branch to reflect the contents of the tarball as closely as they desire. Or modify the source package uploaded to Debian to include anything left out of the tarball.

    My favorite example of this is an upstream who forgot to include their README in their released tarball. Not a made up example; as mentioned tarballs are increasingly an irrelevant side-show to upstreams. If I had been treating the tarball as canonical I would have released a package with no documentation.

  • Whenever Debian developers interact with upstream, whether it's by filing bug reports or sending patches, they're going to be referring to refs in the upstream git repository. They need to have that repository available. The closer and better the relationship with upstream, the more the DD will use that repository. Anything that pulls them away from using that repository is going to add friction to dealing with upstream.

    There have, historically, been quite a lot of sources of friction. From upstreams who choose one VCS while the DD preferred using another, to DDs low on disk space who decided to only version control the debian directory, and not the upstream source code. With disk space increasingly absurdly cheap, and the preponderance of development converging on git, there's no reason for this friction to be allowed to continue.

So using the upstream git repository is valuable. And there is absolutely no technical value, and plenty of potential friction in maintaining a history-disconnected git repository for Debian packaging.

Posted
Template Haskell on impossible architectures

Imagine you had an excellent successful Kickstarter campaign, and during it a lot of people asked for an Android port to be made of the software. Which is written in Haskell. No problem, you'd think -- the user interface can be written as a local webapp, which will be nicely platform agnostic and so make it easy to port. Also, it's easy to promise a lot of stuff during a Kickstarter campaign. Keeps the graph going up. What could go wrong?

So, rather later you realize there is no Haskell compiler for Android. At all. But surely there will be eventually. And so you go off and build the webapp. Since Yesod seems to be the pinnacle of type-safe Haskell web frameworks, you use it. Hmm, there's this Template Haskell stuff that it uses a lot, but it only makes compiles a little slow, and the result is cool, so why not.

Then, about half-way through the project, it seems time to get around to this Android port. And, amazingly, a Haskell compiler for Android has appeared in the meantime. Like the Haskell community has your back. (Which they generally seem to.) It's early days and rough, lots of libraries need to be hacked to work, but it only takes around 2 weeks to get a port of your program that basically works.

But, no webapp. Cause nobody seems to know how to make a cross-compiling Haskell compiler do Template Haskell. (Even building a fully native compiler on Android doesn't do the trick. Perhaps you missed something though.)

At this point you can give up and write a separate Android UI (perhaps using these new Android JNI bindings for Haskell that have also appeared in the meantime). Or you can procrastinate for a while, and mull it over; consider rewriting the webapp to not use Yesod but some other framework that doesn't need Template Haskell.

Eventually you might think this: If I run ghc -ddump-splices when I'm building my Yesod code, I can see all the thousands of lines of delicious machine generated Haskell code. I just have to paste that in, in place of the Template Haskell that generated it, and I'll get a program I can build on Android! What could go wrong?

And you even try it, and yeah, it seems to work. For small amounts of code that you paste in and carefully modify and get working. Not a whole big, constantly improving webapp where every single line of html gets converted to types and lambdas that are somehow screamingly fast.

So then, let's automate this pasting. And so the EvilSplicer is born!

That's a fairly general-purpose Template Haskell splicer. First do a native build with -ddump-splices output redirected to a log file. Run the EvilSplicer to fix up the code. Then run an Android cross-compile.

But oh, the caveats. There are so many ways this can go wrong..

  • The first and most annoying problem you'll encounter is that often Template Haskell splices refer to hidden symbols that are not exported from the modules that define the splices. This lets the splices use those symbols, but prevents them being used in your code.

    This does not seem like a good part of the Template Haskell design, to be honest. It would be better if it required all symbols used in splices to be exported.

    But it can be worked around. Just use trial and error to find every Haskell library that does this, and then modify them to export all the symbols they use. And after each one, rebuild all libraries that depend on it.

    You're very unlikely to end up with more than 9 thousand lines of patches. Because that's all it took me..

  • The next problem (and the next one, and the next ...) is that while GHC's code output by -dump-splices (and indeed, by GHC error messages, etc) looks like valid Haskell code to the casual viewer, it's often not.

    To start with, it often has symbols qualified with the package and module name. ghc-prim:GHC.Types.: does not work well where code originally contained :.

    And then there's fun with multi-line strings, which sometimes cannot be parsed back in by GHC in the form it outputs them.

    And then there's the strange way GHC outputs case expressions, which is not valid Haskell at all. (It's missing some semicolons.)

    Oh, and there's the lambda expressions that GHC outputs with insufficient parentheses, leading to type errors at compile time.

    And so much more fun. Enough fun to give one the idea that this GHC output has never really been treated as code that could be run again. Because that would be a dumb thing to need to do.

  • Just to keep things interesting, the Haskell libraries used by your native GHC and your Android GHC need to be pretty much identical versions. Maybe a little wiggle room, but any version skew could cause unwanted results. Probably, most of the time, unwanted results in the form of a 3 screen long type error message.

    (My longest GHC error message seen on this odyessy was actually a full 500+ kilobytes in size. It included the complete text of Jquery and Bootstrap. At times like these you notice that GHC outputs its error messages o.n.e . c.h.a.r.a.c.t.e.r . a.t . a . t.i.m.e.)

Anyway, if you struggle with it, or pay me vast quantities of money, your program will, eventually, link. And that's all I can promise for now.


PS, I hope nobody will ever find this blog post useful in their work.

PPS, Also, if you let this put you off Haskell in any way .. well, don't. You just might want to wait a year or so before doing Haskell on Android.

Posted