I could write a lot of things about the Github acquisition by Microsoft. About Github's embrace and extend of git, and how it passed unnoticed by people who now fear the same thing now that Microsoft is in the picture. About the stultifying effects of Github's centralization, and its retardant effect on general innovation in spaces around git and software development infrastructure.
Instead I'd rather highlight one simple criteria you can consider when you are evaluating any git hosting service, whether it's Gitlab or something self-hosted, or federated, or P2P[1], or whatever:
Consider all the data that's used to provide the value-added features on
top of git. Issue tracking, wikis, notes in commits, lists of forks, pull
requests, access controls, hooks, other configuration, etc.
Is that data stored in a git repository?
Github avoids doing that and there's a good reason why: By keeping this data in their own database, they lock you into the service. Consider if Github issues had been stored in a git repository next to the code. Anyone could quickly and easily clone the issue data, consume it, write alternative issue tracking interfaces, which then start accepting git pushes of issue updates and syncing all around. That would have quickly became the de-facto distributed issue tracking data format.
Instead, Github stuck it in a database, with a rate-limited API, and while this probably had as much to do with expediency, and a certain centralized mindset, as intentional lock-in at first, it's now become such good lock-in that Microsoft felt Github was worth $7 billion.
So, if whatever thing you're looking at instead of Github doesn't do this, it's at worst hoping to emulate that, or at best it's neglecting an opportunity to get us out of the trap we now find ourselves in.
[1] Although in the case of a P2P system which uses a distributed data structure, that can have many of the same benefits as using git. So, git-ssb, which stores issues etc as ssb messages, is just as good, for example.
And data is the plural of datum. Except in common usage neither is.
And this kind of pedantry feels like a tactic.
Oh, and your comment was committed to git, so at least it demonstrated something useful.
That first paragraph is a killer opener, like a song that's ridiculously good by how brutal it manages to be because of the way every line packs a punch.
Having said that, it doesn't follow that because GitHub made deliberate choices for anticompetitive lock-in, then another solution—say, a P2P one—should therefore store projects' "extras" in Git repos.
In other words:
If $PARTY had chosen Git for $USECASE, it wouldn't have let them achieve lock-in, does not mean, project extras stored in something besides Git necessarily results in lock-in. It's possible that Git repos are the entirely wrong abstraction for this. Indeed, insisting on stuffing all the auxiliary stuff into Git, too, (and all the implications of choosing those constraints) strongly smells of that phenomenon where people pick up one abstraction and then try to shoehorn it in everywhere else, regardless of whether it's well-suited for that use case or not. I write this even knowing your affinity and relationship to Git with git-annex & co.
@colbyrussell, my guess is you logged in by email with joeyh@yourdomain.com. I've adjusted the nickname displayed.
To your point: I have done a fair bit of exploration of what makes sense to shoehorn into git and how to do it, and issues and PRs and project configuration data are really pretty trivial to do. And it's been done, many times before -- https://dist-bugs.branchable.com/ -- but https://xkcd.com/927/ applies to those efforts. So I'm not suggesting this from a naive position.
And while your logic is impeccible, so is the relentless logic of VC backed startups and how they develop. No criteria are perfect (see above footnote), but I do feel this is a good one, and better in this case than "is it free software?" for pointing in the direction of some solutions. (Combine the two criteria and you get the Franklin Street Statement..)
Fedora's Pagure mostly satisfies this requirement. All issues, comments, PR metadata, etc. are stored as Git repositories internally. You can pull, commit, and push through it that way. It also supports "remote pull requests" where the git repository of the source branch isn't hosted on the same server as the repo being targeted. The source repo doesn't even have to run Pagure!
It uses Gitolite underneath to manage ACLs, and that isn't managed through Git repos, but instead through the application with settings in the database. But I'd argue all the meaningful information is available in a useful form.
Not quite true. Both gists and wiki pages are backed by git.
The API rate limit is 5000 requests per hour. If that's a barrier to you extracting your project data, then you're clearly doing something wrong.
Also, the word you're looking for is "criterion". Be more precise in your vocabulary!
I wholeheartedly agree on the vendor lock-in with issues and pull requests. I've had a feeling that GitHub sort of well-intendedly hijacked open source communities. A few months ago, that pushed me to develop a resilient, future-proof, decentralized issue tracker that I later generalized for other problems as well (SIT). I wrote a bit about my vision when I first announced it: “Under the Covers of Code”
I really wanted to build something as future-proof and resilient as I can imagine, so SIT ended up not being specific to Git or any other SCM but instead relies on simple additive sets of files.
I wonder how many closed-source companies using github have managers that remember "DOS ain't done 'till Lotus don't run" and are regretting making their family jewels available to MS. And someone else being payed for it.
I haven't seen the closed-source, private repo contracts, so they might have appropriate protections.
Checking each issue for new comments takes an API call. The API is paginated == more API calls the more issues and comments there are. Finding comments attached to commits is additional API calls (and there was not a way to discover them other than trying every commit in turn last I checked). Repositories can have forks, which can have issues, which can have comments, for piles more API calls. (I know about this in some detail because I've written software to deal with all of it. It's constantly rate limited.)
And then, most developers are not involved with a single software project. It's not uncommon to have dozens of dependencies you want to keep an eye on or are tangentially involved with in your work on a single project. Out of all your other projects.
12 hours to check something out is not fertile ground for a distributed ecosystem to develop: As evidence see the lack of such an ecosystem.
They both popularized and exploited a few design flaws of git. Rushing all eggs into another centralized basket is not really the lesson to be learned here.
Git is not the only DVCS. It's just the one clinging to repositories-are-fileshrubbery thing. Nobody HAS TO make do with it merely because it's popular (again, GitHubs fault). Decentralization buys you nothing with an implied monoculture.