git-annex and funding update

git-annex v7 was released this fall, the culmination of a long effort to add some important new features to git-annex. Rather than go into details about it here, see this LWN article comparing and contrasting git-annex with git lfs.

For three years my work on git-annex had major support from Dartmouth's DataLad project, pushing it into use in the sciences, and driving large improvements in git-annex's API, concurrency support, etc. But that relied on government funding which has been drying up. Increasingly I have been relying on croudfunding from git-annex's users.

Now I'm entering a new phase, where DataLad users may also want to support git-annex. So far, McGill's NeuroHub project has committed to supporting its development (funded by the Canada First Research Excellence Fund, for the Healthy Brains for Healthy Lives initiative), but I hope others will too. A diversity of funding sources is best.

A survey of git-annex users is now underway, the first in three years. If you've not already, please participate in it to help direct my newly funded work.

Posted
fridge 0.2

My offgrid, solar powered, zero-battery-use fridge has sucessfully made it through spring, summer, fall, and more than enough winter.

I've proven that it works. I've not gotten food poisoning, though I did lose half a gallon of milk on one super rainy week. I have piles of data, and a whole wiki documenting how I built it. I've developed 3 thousand lines of control software. It purrs along without any assistance.

Fridge0 consists of a standard chest freezer, an added thermal mass, an inverter, and computer control. It ties into the typical offfgrid system of a solar charge controller, battery bank, and photovoltaic panels.

This isn't going to solve global warming or anything, but it does seem much less expensive than traditional offgrid fridge systems, and it ties in with thinking on renewable power such as Low Tech magazine's Redefining Energy Security "To improve energy security, we need to make infrastructures less reliable."

I mostly wanted to share the wiki, in case someone wants to build something like this. And to post some data. Here's the summer and fall temperature data.


(More on temperature ranges here.)

I want to be upfront that this is not guaranteed to work in every situation. Here's that time that the milk spoiled. A tropical storm was involved. Most of the time milk stays good 2 to 3 weeks in my fridge.

Some things I might get around to doing eventually:

  • Using a supercapacitor to provide power while shutting down on loss of solar power, instead of the current few minutes of use of batteries.
  • Also running a freezer, dividing up solar power between them.
  • A self-contained build with its own solar panels and electronics, instead of the current build that uses my house's server and solar panels.
  • A full BOM or kit, just add solar panels and chest freezer to quickly build your own.

I probably won't be devoting much time to this in the upcoming year, but if anyone wants to build one I'm happy to help you.

effective bug tracking illustrated with AT&T

I'm pleased to have teamed up with AT&T to bring you this illustrated guide to effective bug tracking.

telephone pole with phone box spewing wires, and several obviously cut cables attaches

The original issue description was "noise / static on line", and as we can see, AT&T have very effectively closed the ticket: There is no longer any noise, of any kind, on the phone line.

No electrons == no noise, so this is the absolute simplest and most effective fix possible. Always start with the simplest such fix, and be sure to close the problem ticket immediately on fixing. Do not followup with the issue reporter, or contact them in any way to explain how the issue was resolved.

telephone pole with phone wire wrapped down it and extending across the ground

While in the guts of the system fixing such a bug report, you'll probably see something that could be improved by some light refactoring. It's always a good idea to do that right away, because refactoring can often just solves an issue on its own somehow. (Never use your own issue tracking system to report issues to yourself to deal with later, because that would just be bonkers.)

But don't go overboard with refactoring. As we see here, when AT&T decided to run a new line between two poles involved in my bug report, they simply ran it along the ground next to my neighbor's barn. A few festive loops and bows prevent any possible damage by tractor. Can always refactor more later.

phone cable tied to pole

The only other information included in my bug report was "house at end of loong driveway". AT&T helpfully limited the size of the field to something smaller than 1 (old-style) tweet, to prevent some long brain dump being put in there.

You don't want to hear that I've lived here for 7 years and the buried line has never been clean but's been getting a bit more noisy lately, or that I noticed signs of water ingress at two of the junction boxes, or that it got much much worse after a recent snow storm, to the point that I was answering the phone by yelling "my phone line is broken" down the line consumed with static.

cartoon of room 641A, red lines on a
screen connect 'nowden', Applebaum, Hess and 'unar' above a 'land line inactive'.
Speech bubble: My Holidaze came early this year!

Design your bug tracking system to not let the user really communicate with you. You know what's wrong better than them.

And certianly don't try to reproduce the circumstances of the bug report. No need to visit my house and check the outside line when you've already identified and clearly fixed the problem at the pole.

My second bug report is "no dial tone" with access information "on porch end of long driveway". With that, I seem to be trying to solicit some kind of contact outside the bug tracking system. That is never a good idea though, and AT&T should instruct their linemen to avoid any possible contact with the user, or any attempts to convey information outside the issue tracking system.

laminated handwritten note pinned to
phone pole with one red and one green pin. reads: Buried phone line was cut
by last lineman -- please repair. House 500 ft up driveway has no dialtone.
Santa, all I want for Xmas is a dialtone!

AT&T's issue tracking system reports "Service Restore Date: 12/25/2018 at 12:00 AM" but perhaps they'll provide more effective issue tracking tips for me to share with you. Watch this space.

Posted
censored Amazon review of Sandisk Ultra 32GB Micro SDHC Card

★ counterfeits in amazon pipeline

The 32 gb card I bought here at Amazon turned out to be fake. Within days I was getting read errors, even though the card was still mostly empty.

The logo is noticably blurry compared with a 32 gb card purchased elsewhere. Also, the color of the grey half of the card is subtly wrong, and the lettering is subtly wrong.

Amazon apparently has counterfiet stock in their pipeline, google "amazon counterfiet" for more.

You will not find this review on Sandisk Ultra 32GB Micro SDHC UHS-I Card with Adapter - 98MB/s U1 A1 - SDSQUAR-032G-GN6MA because it was rejected. As far as I can tell my review violates none of Amazon's posted guidelines. But it's specific about how to tell this card is counterfeit, and it mentions a real and ongoing issue that Amazon clearly wants to cover up.

usb drives with no phantom load

For a long time I've not had any network attached storage at home, because it's offgrid and power budget didn't allow it. But now I have 16 terabytes of network attached storage, that uses no power at all when it's not in use, and automatically spins up on demand.

I used a USB hub with per-port power control. But even with a USB drive's port powered down, there's a parasitic draw of around 3 watts per drive. Not a lot, but with 4 drives that's more power wasted than leaving a couple of ceiling lights on all the time. So I put all the equipment behind a relay too, so it can be fully powered down.

I'm using systemd for automounting the drives, and have it configured to power a drive's USB port on and off as needed using uhubctl. This was kind of tricky to work out how to do, but it works very well.

Here's the mount unit for a drive, media-joey-passport.mount:

[Unit]
Description=passport
Requires=startech-hub-port-4.service
After=startech-hub-port-4.service
[Mount]
Options=noauto
What=/dev/disk/by-label/passport
Where=/media/joey/passport

That's on port 4 of the USB hub, the startech-hub-port-4.service unit file is this:

[Unit]
Description=Startech usb hub port 4
PartOf=media-joey-passport.mount
[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/usr/sbin/uhubctl -a on -p 4 ; /bin/sleep 20
ExecStop=/usr/sbin/uhubctl -a off -p 4

The combination of PartOf with Requires and After in these units makes systemd start the port 4 service before mounting the drive, and stop it after unmounting. This was the hardest part to work out.

The sleep 20 is a bit unfortunate, it seems that it can take a few seconds for the drive to power up enough for the kernel to see it, and so without that the mount can fail, leaving the drive powered on indefinitely. Seems there ought to be a way to declare an additional dependency and avoid needing that sleep? Update: See my comment below for a better way.

Finally, the automount unit for the drive, media-joey-passport.automount:

[Unit]
Description=Automount passport
[Automount]
Where=/media/joey/passport
TimeoutIdleSec=300
[Install]
WantedBy=multi-user.target

The TimeoutIdleSec makes it unmount after around 5 minutes of not being used, and then its USB port gets powered off.

I decided to not automate the relay as part of the above, instead I typically turn it on for 5 hours or so, and use the storage whenever I want during that window. One advantage to that is cron jobs can't spin up the drives in the early morning hours.

Dear Ad Networks

In 1 week, I plan to benchmark all your advertisment delivery systems from IP address block 184.20/16.

Please note attached Intel microcode license may apply to your servers. If you don't want me benchmarking your ad servers, simply blacklist my IP block now.

Love, Joey

PS The benchmarking will continue indefinitely.

Posted
two security holes and a new library

For the past week and a half, I've been working on embargoed security holes. The embargo is over, and git-annex 6.20180626 has been released, fixing those holes. I'm also announcing a new Haskell library, http-client-restricted, which could be used to avoid similar problems in other programs.

Working in secret under a security embargo is mostly new to me, and I mostly don't like it, but it seems to have been the right call in this case. The first security hole I found in git-annex turned out to have a wider impact, affecting code in git-annex plugins (aka external special remotes) that uses HTTP. And quite likely beyond git-annex to unrelated programs, but I'll let their developers talk about that. So quite a lot of people were involved in this behind the scenes.

See also: The RESTLESS Vulnerability: Non-Browser Based Cross-Domain HTTP Request Attacks

And then there was the second security hole in git-annex, which took several days to notice, in collaboration with Daniel Dent. That one's potentially very nasty, allowing decryption of arbitrary gpg-encrypted files, although exploiting it would be hard. It logically followed from the first security hole, so it's good that the first security hole was under embagro long enough for us to think it all though.

These security holes involved HTTP servers doing things to exploit clients that connect to them. For example, a HTTP server that a client asks for the content of a file stored on it can redirect to a file:// on the client's disk, or to http://localhost/ or a private web server on the client's internal network. Once the client is tricked into downloading such private data, the confusion can result in private data being exposed. See the_advisory for details.

Fixing this kind of security hole is not necessarily easy, because we use HTTP libraries, often via an API library, which may not give much control over following redirects. DNS rebinding attacks can be used to defeat security checks, if the HTTP library doesn't expose the IP address it's connecting to.

I faced this problem in git-annex's use of the Haskell http-client library. So I had to write a new library, http-client-restricted. Thanks to the good design of the http-client library, particularly its Manager abstraction, my library extends it rather than needing to replace it, and can be used with any API library built on top of http-client.

I get the impression that a lot of other language's HTTP libraries need to have similar things developed. Much like web browsers need to enforce same-origin policies, HTTP clients need to be able to reject certain redirects according to the security needs of the program using them.

I kept a private journal while working on these security holes, and am publishing it now:

Posted
the single most important criteria when replacing Github

I could write a lot of things about the Github acquisition by Microsoft. About Github's embrace and extend of git, and how it passed unnoticed by people who now fear the same thing now that Microsoft is in the picture. About the stultifying effects of Github's centralization, and its retardant effect on general innovation in spaces around git and software development infrastructure.

Instead I'd rather highlight one simple criteria you can consider when you are evaluating any git hosting service, whether it's Gitlab or something self-hosted, or federated, or P2P[1], or whatever:

Consider all the data that's used to provide the value-added features on top of git. Issue tracking, wikis, notes in commits, lists of forks, pull requests, access controls, hooks, other configuration, etc.
Is that data stored in a git repository?

Github avoids doing that and there's a good reason why: By keeping this data in their own database, they lock you into the service. Consider if Github issues had been stored in a git repository next to the code. Anyone could quickly and easily clone the issue data, consume it, write alternative issue tracking interfaces, which then start accepting git pushes of issue updates and syncing all around. That would have quickly became the de-facto distributed issue tracking data format.

Instead, Github stuck it in a database, with a rate-limited API, and while this probably had as much to do with expediency, and a certain centralized mindset, as intentional lock-in at first, it's now become such good lock-in that Microsoft felt Github was worth $7 billion.

So, if whatever thing you're looking at instead of Github doesn't do this, it's at worst hoping to emulate that, or at best it's neglecting an opportunity to get us out of the trap we now find ourselves in.


[1] Although in the case of a P2P system which uses a distributed data structure, that can have many of the same benefits as using git. So, git-ssb, which stores issues etc as ssb messages, is just as good, for example.

fridge 0.1

Imagine something really cool, like a fridge connected to a powerwall, powered entirely by solar panels. What could be cooler than that?

How about a fridge powered entirely by solar panels without the powerwall? Zero battery use, and yet it still preserves your food.

That's much cooler, because batteries, even hyped ones like the powerwall, are expensive and innefficient and have limited cycles. Solar panels are cheap and efficient now. With enough solar panels that the fridge has power to cool down most days (even cloudy days), and a smart enough control system, the fridge itself becomes the battery -- a cold battery.

I'm live coding my fridge, with that goal in mind. You can follow along in this design thread on secure scuttlebutt, and my git commits, and you can watch real-time data from my fridge.

Over the past two days, which were not especially sunny, my 1 kilowatt of solar panels has managed to cool the fridge down close to standard fridge temperatures. The temperature remains steady overnight thanks to added thermal mass in the fridge. My food seems safe in it, despite it being powered off for 14 hours each night.

graph of fridge temperature, starting at 13C and trending downwards to 5C over 24 hours

(Numbers in this graph are running higher than the actual temps of food in the fridge, for reasons explained in the scuttlebutt thread.)

Of course, the longterm viability of a fridge that never draws from a battery is TBD; I'll know within a year if it works for me.

bunch of bananas resting on top of chest freezer fridge conversion

I've written about the coding side of this project before, in my haskell controlled offgrid fridge. The reactive-banana-automation library is working well in this application. My AIMS inverter control board and easy-peasy-devicetree-squeezy were other groundwork for this project.