Daniel Pocock posted The multiple repository conundrum in Linux packaging. While a generally good and useful post, which upstream developers will find helpful to understand how Debian packages their software, it contains this statement:
If it is the first download, the maintainer creates a new git repository. If it has been packaged before, he clones the repository. The important point here is that this is not the upstream repository, it is an independent repository for Debian packaging.
The only thing important about that point is that it highlights an unnecessary disconnect between the Debian developer and upstream development. One which upstream will surely find annoying and should certainly not be bothered with.
There is absolutely no technical reason to not use the upstream git repository as the basis for the git repository used in Debian packaging. I would never package software maintained in a git repository upstream and not do so.
The details are as follows:
For historical reasons that are continuingly vanishing in importance, Debian fetishises the tarballs produced by upstream. While upstreams increasingly consider them an unimportant distraction, Debian insists in hoarding and rolling around in on its nest of gleaming pristine tarballs.
I wrote pristine-tar to facilitate this behavior, while also pointing fun at it, and perhaps introducing a weak spot with which to eventually slay this particular dragon. It is widely used within Debian.
.. Anyway, the point is that it's no problem to import upstream's tarball into a clone if their git repository. It's fine if that tarball includes files not present in their git repository. Indeed, upstream can do this at release time if they like. Or Debian developers can do it and push a small quantity of data back to upstream in a branch.
Sometimes tagged releases in upstream git repositories differ from the files in their released tarballs. This is actually, in my experience, less due to autotools generated files, and more due to manual and imperfect release processes, human error, etc. (Arguably, autotools are a form of human error.)
When this happens, and the Debian developer is tracking upstream git, they can quite easily modify their branch to reflect the contents of the tarball as closely as they desire. Or modify the source package uploaded to Debian to include anything left out of the tarball.
My favorite example of this is an upstream who forgot to include their README in their released tarball. Not a made up example; as mentioned tarballs are increasingly an irrelevant side-show to upstreams. If I had been treating the tarball as canonical I would have released a package with no documentation.
Whenever Debian developers interact with upstream, whether it's by filing bug reports or sending patches, they're going to be referring to refs in the upstream git repository. They need to have that repository available. The closer and better the relationship with upstream, the more the DD will use that repository. Anything that pulls them away from using that repository is going to add friction to dealing with upstream.
There have, historically, been quite a lot of sources of friction. From upstreams who choose one VCS while the DD preferred using another, to DDs low on disk space who decided to only version control the
debian
directory, and not the upstream source code. With disk space increasingly absurdly cheap, and the preponderance of development converging on git, there's no reason for this friction to be allowed to continue.
So using the upstream git repository is valuable. And there is absolutely no technical value, and plenty of potential friction in maintaining a history-disconnected git repository for Debian packaging.
I recall a discussion on debian-mentors connected with this matter, in particular about how DAK would refuse an orig.tar.gz re-generated from the upstream repository for patch-releases of the Debian packaging bits, because tar is not stable: http://lists.debian.org/debian-mentors/2012/12/msg00044.html
I am not a DD or a DM and I don't know the Debian ecosystem well enough to tell if that is still the case, maybe Joey or some of the readers can can comment on that?
Thanks.
One interesting difference I've seen with upstream tarballs versus just using the git or whatever repository is generated files. The standard autoconf stuff has additional files in the tarball that do not appear in the git repository (they are generated at the make dist stage).
While they can be generated by the packager this is sometimes troublesome. The alternative is to put the generated files into the repository, but that's bad too.
I'm looking at this right now. The multi-branch formats (enforced by dpm, gbp) are high-maintenance because I have to continuously rebase or merge debian on top of master, gitpkg can work with a single-branch but it requires new debian/changelog entries because it refuses to overwrite the orig tarball, 3.0 (git) looks like it would work but can't be uploaded right now. Ideally I'd have:
It seems reasonably doable, dch, git export, git tag and dpkg-source just need to be combined the right way.
gitpkg might be of use since its design doesn't obviously clash with this one. However, it needs to be configured from a versioned file so that anyone can build the package immediately after cloning; arbitrary hooks are out.
If people do want patch series (which should only be if they don't have push privileges and upstream can't take a pull request in a reasonable timeframe), namespacing tags with debian/v1.0-1 and upstream/v1.0 should be enough to reconstruct that. Upstream is defined as someone who has the privilege to create version numbers; ideally everyone should be able to do so with [~+]username-style namespacing to the left of the version-debversion dash, and lintian would accept these names for "orig" tarballs.