Re: Debugging over email

Lars wrote about the remote debugging problem.

I write free software and I have some users. My primary support channels are over email and IRC, which means I do not have direct access to the system where my software runs. When one of my users has a problem, we go through one or more cycles of them reporting what they see and me asking them for more information, or asking them to try this thing or that thing and report results. This can be quite frustrating.

I want, nay, need to improve this.

This is also something I've thought about on and off, that affects me most every day.

I've found that building the test suite into the program, such that users can run it at any time, is a great way to smoke out problems. If a user thinks they have problem A but the test suite explodes, or also turns up problems B C D, then I have much more than the user's problem report to go on. git annex test is a good example of this.

Asking users to provide a recipe to reproduce the bug is very helpful; I do it in the git-annex bug report template, and while not all users do, and users often provide a reproducion recipe that doesn't quite work, it's great in triage to be able to try a set of steps without thinking much and see if you can reproduce the bug. So I tend to look at such bug reports first, and solve them more quickly, which tends towards a virtuous cycle.

I've noticed that reams of debugging output, logs, test suite failures, etc can be useful once I'm well into tracking a problem down. But during triage, they make it harder to understand what the problem actually is. Information overload. Being able to reproduce the problem myself is far more valuable than this stuff.

I've noticed that once I am in a position to run some commands in the environment that has the problem, it seems to be much easier to solve it than when I'm trying to get the user to debug it remotely. This must be partly psychological?

Partly, I think that the feeling of being at a remove from the system, makes it harder to think of what to do. And then there are the times where the user pastes some output of running some commands and I mentally skip right over an important part of it. Because I didn't think to run one of the commands myself.

I wonder if it would be helpful to have a kind of ssh equivilant, where all commands get vetted by the remote user before being run on their system. (And the user can also see command output before it gets sent back, to NACK sending of personal information.) So, it looks and feels a lot like you're in a mosh session to the user's computer (which need not have a public IP or have an open ssh port at all), although one with a lot of lag and where rm -rf / doesn't go through.

Posted
twenty years of free software -- part 13 past and future

This series has focused on new projects. I could have said more about significant work that didn't involve starting new projects. A big example was when I added dh to debhelper, which led to changes in a large percentage of debian/rules files. And I've contributed to many other free software projects.

I guess I've focused on new projects becuase it's easier to remember things I've started myself. And because a new project is more wide open, with more scope for interesting ideas, especially when it's free software being created just because I want to, with no expectations of success.

But starting lots of your own projects also makes you responsible for maintaining a lot of stuff. Ten years ago I had dozens of projects that I'd started and was still maintaining. Over time I started pulling away from Debian, with active projects increasingly not connected with it. By the end, I'd left and stopped working on the old projects. Nearly everything from my first decade in free software was passed on to new maintainers. It's good to get out from under old projects and concentrate on new things.

I saved propellor for last in this series, because I think it may point toward the future of my software. While git-annex was the project that taught me Haskell, propellor's design is much more deeply influenced by the Haskell viewpoint.

Absorbing that viewpoint has itself been a big undertaking for me this decade. It's like I was coasting along feeling at the top of my game and wham got hit by the type theory bus. And now I see that I was stuck in a rut before, and begin to get a feeling of many new possibilities.

That's a good feeling to have, twenty years in.

Posted
twenty years of free software -- part 12 propellor

Propellor is my second big Haskell program. I recently described the motivation for it like this, in a proposal for a Linux.Conf.Au talk:

The configuration of Linux hosts has become increasingly declarative, managed by tools like puppet and ansible, or by the composition of containers. But if a server is a collection of declarative properties, how do you make sure that changes to that configuration make sense? You can test them, but eventually it's 3 AM and you have an emergency fix that needs to go live immediately.

Data types to the rescue! While data types are usually used to prevent eg, combining an Int and a Bool, they can be used at a much more abstract level, for example to prevent combining a property that needs a Debian system with a property that needs a Red Hat system.

Propellor leverages Haskell's type system to prove the consistency of the properties it will apply to a host.

The real origin story though, is that I wanted to finally start using configuration management, but the tools for it all seemed very complicated and built on shaky foundations (like piles of yaml), and it seemed it would be easier to write my own than deal with that. Meanwhile, I had Haskell burning a hole in my pocket, ready to be used in a second large project after git-annex.

Propellor has averaged around 2.5 contributions per month from users since it got started, but increasing numbers recently. That's despite having many fewer users than git-annex, which remember gets perhaps 1 patch per month.

Of course, I've "cheated" by making sure that propellor's users know Haskell, or are willing to learn some. And, propellor is very compositional; adding a new property to it is not likely to be complicated by any of the existing code. So it's easy to extend, if you're able to use it.

At this point propellor has a small community of regular contributors, and I spend some pleasant weekend afternoons reviewing and merging their work.

Much of my best work on propellor has involved keeping the behavior of the program the same while making its types better, to prevent mistakes. Propellor's core data types have evolved much more than in any program I worked on before. That's exciting!

Next: twenty years of free software -- part 13 past and future

twenty years of free software -- part 11 concurrent-output

concurrent-output is a more meaty Haskell library than the ones I've covered so far. Its interface is simple, but there's a lot of complexity under the hood. Things like optimised console updates, ANSI escape sequence parsing, and transparent paging of buffers to disk.

It developed out of needing to display multiple progress bars on the console in git-annex, and also turned out to be useful in propellor. And since it solves a general problem, other haskell programs are moving toward using it, like shake and stack.

Next: twenty years of free software -- part 12 propellor

Posted
twenty years of free software -- part 10 shell-monad

shell-monad is a small project, done over a couple days and not needing many changes since, but I'm covering it separately because it was a bit of a milestone for me.

As I learned Haskell, I noticed that the libraries were excellent and did things to guide their users that libraries in other languages don't do. Starting with using types and EDSLs and carefully constrained interfaces, but going well beyond that, as far as applying category theory. Using these libraries push you toward good solutions.

shell-monad was a first attempt at building such a library. The shell script it generates should always be syntactically valid, and never forgets to quote a shell variable. That's only the basics. It goes further by making it impossible to typo the name of a shell variable or shell function. And it uses phantom types so that the Haskell type checker can check the types of shell variables and functions match up.

So I think shell-monad is pretty neat, and I certianly learned a lot about writing Haskell libraries making it. Including how much I still have to learn!

I have not used shell-monad much, but keep meaning to make propellor and git-annex use it for some of their shell script needs. And ponder porting etckeeper to generate its shell scripts using it.

Next: twenty years of free software -- part 11 concurrent-output

Posted
twenty years of free software -- part 9 small projects

ey dad sometimes asks when I'll finish git-annex. The answer is "I don't know" because software like that doesn't have a defined end point; it grows and changes in response to how people use it and how the wider ecosystem develops.

But other software has a well-defined end point and can be finished. Some of my smaller projects that are more or less done include electrum-mnemonic, brainfuck-monad, scroll, yesod-lucid haskell-mountpoints.

Studies of free software projects have found that the average free software project was written entirely by one developer, is not very large, and is not being updated. That's often taken to mean it's a failed or dead project. But all the projects above look that way, and are not failures, or dead.

It's good to actually finish some software once in a while!

Next: twenty years of free software -- part 10 shell-monad

twenty years of free software -- part 8 github-backup

github-backup is an attempt to take something I don't like -- github's centralization of what should be a decentralized techology -- and find a constrictive way to make it at least meet my baseline requirements for centralized systems. Namely that when they go away, I don't lose data.

So, it was written partly with my ArchiveTeam hat on.

A recent bug filed on it, Backup fails for repositories unavailable due to DMCA takedown made me happy, because it shows github-backup behaving more or less as intended, although perhaps not in the optimal way.

By the way, this is the only one of my projects that uses github for issue tracking. Intentionally ironically.

It was my second real Haskell program (after git-annex) and so also served as a good exercise in applying what I'd learned about writing Haskell up to that point.

Next: twenty years of free software -- part 9 small projects

Posted
twenty years of free software -- part 7 git-annex

My first Haskell program, and the only software I've written that was inspired by living in a particular place, git-annex has received the lion's share of my time for five years.

It was written just to solve my own problem, but in a general way, that turned out to be useful in lots of other situations. So over the first half a year or so, it started attracting some early adopters who made some very helpful suggestions.

Then I did the git-annex assistant kickstarter, and started blogging about each day I worked on it. Four years of funding and seven hundred and twenty one posts later, the git-annex devblog is still going. So, I won't talk about technical details in this post, they've all been covered.

One thing I wondered when starting git-annex -- besides whether I would be able to write it in Haskell at all -- was would that prevent it from getting many patches. I count roughly 65 "thanks" messages in the changelog, so it gets perhaps one patch contributed per month. It's hard to say if that's a lot or a little.

Part of git-annex is supporting various cloud storage systems via "special remotes". Of those not written by me, only 1 was contributed in Haskell. Compare with 13 that use the plugin system that lets other programming languages be used.

The other question about using Haskell is, did it make git-annex a better program. I think it did. The strong type system did prevent plenty of bugs, although there have still been some real howlers. The code is still not taking full advantage of the power of Haskell's type system, on the other hand it uses many Haskell libraries that do leverage the type system more. I've done more small and large refactorings of git-annex than on any other program I've written, because the strong types and referential transparency makes refactoring easier and safer in Haskell.

And the code has turned out to be much more flexible, for all its static types, than the kind of code I was writing before. Examples include building the git-annex assistant, which uses the rest of git-annex as a library, and making git-annex run actions concurrently, thanks to there being no global variables to complicate things (and excellent support for concurrency and parallelism in Haskell).

So: Glad I wrote it, glad I used Haskell for it, estatic that many other people have found it useful, and astounded that I've been funded to work on it for four years.

Next: twenty years of free software -- part 8 github-backup

Posted
Joey
Hacker Holler

A quiet place in which to get away and code is all I was looking for when I moved here. I found much more, but that's still the essence of the place.

On returning home from the beach, I've just learned that after several years renting this house, I will soon have to leave, or buy it.

The house is an EarthShip, tucked away in its own private holler (as we say here in the Appalachian Mtns of Tennessee), below a mountain that is National Forest, two miles down back roads from a river.

A wonderful place to relax and code, but developing only free software for twenty years doesn't quite stretch to being able to afford buying this kind of place.

But, I got to thinking of times friends were able to visit me here. Grilling over wood fires with friends from Debian. Steep hikes and river swims. Sharing dialup bandwidth between our Linux laptops. A bunch of us discussing Haskell in the living room at midnight. And too, I've many times talked about the place with someone who got a gleam in their eye, imagining themselves living there.

And then there's my Yurt, my relief valve before I moved here. And a great spot I like to visit on an old logging road above a creek.

Could we put all this together somehow? Might a handful of my friends be able to contribute somewhere in the range of $10 thousand to buy in?

Update: That was too much to hope for as it turned out. But, this post did lead to some possibilities, which might let me afford the place. Stay tuned.

Posted
twenty years of free software -- part 6 moreutils

moreutils is a little love letter to the Unix Tools philosophy. It was interesting to try to find new tools as basic as cat and ls. With sponge and vidir and ifne and chronic and others, we managed to find several such tools.

So, it was fun to work on moreutils, but it also ran into inherent problems with the Unix Tools philosophy. One is namespacing; there are only so many good short names for commands, and a command like parallel can easily collide with something else. And since Unix Tools have a bigger surface area than a pure function, my parallel is not going to be quite compatible with your parallel, even if they were developed with (erm) parallel intentions.

Partly due to that problem, I have gotten pickier about adding new tools to moreutils as it's gotten older, and so there's a lot of suggested additions that I will probably never get to.

And as my mention of pure functions suggests, I have kind of moved on from being a big fan of the Unix Tools philosophy. Unix tools are a decent approximation of pure functions for their time, but they are not really pure, and not typed at all, and not usefully namespaced, and this limits them.

Next: twenty years of free software -- part 7 git-annex

Posted