two security holes and a new library

For the past week and a half, I've been working on embargoed security holes. The embargo is over, and git-annex 6.20180626 has been released, fixing those holes. I'm also announcing a new Haskell library, http-client-restricted, which could be used to avoid similar problems in other programs.

Working in secret under a security embargo is mostly new to me, and I mostly don't like it, but it seems to have been the right call in this case. The first security hole I found in git-annex turned out to have a wider impact, affecting code in git-annex plugins (aka external special remotes) that uses HTTP. And quite likely beyond git-annex to unrelated programs, but I'll let their developers talk about that. So quite a lot of people were involved in this behind the scenes.

See also: The RESTLESS Vulnerability: Non-Browser Based Cross-Domain HTTP Request Attacks

And then there was the second security hole in git-annex, which took several days to notice, in collaboration with Daniel Dent. That one's potentially very nasty, allowing decryption of arbitrary gpg-encrypted files, although exploiting it would be hard. It logically followed from the first security hole, so it's good that the first security hole was under embagro long enough for us to think it all though.

These security holes involved HTTP servers doing things to exploit clients that connect to them. For example, a HTTP server that a client asks for the content of a file stored on it can redirect to a file:// on the client's disk, or to http://localhost/ or a private web server on the client's internal network. Once the client is tricked into downloading such private data, the confusion can result in private data being exposed. See the_advisory for details.

Fixing this kind of security hole is not necessarily easy, because we use HTTP libraries, often via an API library, which may not give much control over following redirects. DNS rebinding attacks can be used to defeat security checks, if the HTTP library doesn't expose the IP address it's connecting to.

I faced this problem in git-annex's use of the Haskell http-client library. So I had to write a new library, http-client-restricted. Thanks to the good design of the http-client library, particularly its Manager abstraction, my library extends it rather than needing to replace it, and can be used with any API library built on top of http-client.

I get the impression that a lot of other language's HTTP libraries need to have similar things developed. Much like web browsers need to enforce same-origin policies, HTTP clients need to be able to reject certain redirects according to the security needs of the program using them.

I kept a private journal while working on these security holes, and am publishing it now:

Posted
the single most important criteria when replacing Github

I could write a lot of things about the Github acquisition by Microsoft. About Github's embrace and extend of git, and how it passed unnoticed by people who now fear the same thing now that Microsoft is in the picture. About the stultifying effects of Github's centralization, and its retardant effect on general innovation in spaces around git and software development infrastructure.

Instead I'd rather highlight one simple criteria you can consider when you are evaluating any git hosting service, whether it's Gitlab or something self-hosted, or federated, or P2P[1], or whatever:

Consider all the data that's used to provide the value-added features on top of git. Issue tracking, wikis, notes in commits, lists of forks, pull requests, access controls, hooks, other configuration, etc.
Is that data stored in a git repository?

Github avoids doing that and there's a good reason why: By keeping this data in their own database, they lock you into the service. Consider if Github issues had been stored in a git repository next to the code. Anyone could quickly and easily clone the issue data, consume it, write alternative issue tracking interfaces, which then start accepting git pushes of issue updates and syncing all around. That would have quickly became the de-facto distributed issue tracking data format.

Instead, Github stuck it in a database, with a rate-limited API, and while this probably had as much to do with expediency, and a certain centralized mindset, as intentional lock-in at first, it's now become such good lock-in that Microsoft felt Github was worth $7 billion.

So, if whatever thing you're looking at instead of Github doesn't do this, it's at worst hoping to emulate that, or at best it's neglecting an opportunity to get us out of the trap we now find ourselves in.


[1] Although in the case of a P2P system which uses a distributed data structure, that can have many of the same benefits as using git. So, git-ssb, which stores issues etc as ssb messages, is just as good, for example.

fridge 0.1

Imagine something really cool, like a fridge connected to a powerwall, powered entirely by solar panels. What could be cooler than that?

How about a fridge powered entirely by solar panels without the powerwall? Zero battery use, and yet it still preserves your food.

That's much cooler, because batteries, even hyped ones like the powerwall, are expensive and innefficient and have limited cycles. Solar panels are cheap and efficient now. With enough solar panels that the fridge has power to cool down most days (even cloudy days), and a smart enough control system, the fridge itself becomes the battery -- a cold battery.

I'm live coding my fridge, with that goal in mind. You can follow along in this design thread on secure scuttlebutt, and my git commits, and you can watch real-time data from my fridge.

Over the past two days, which were not especially sunny, my 1 kilowatt of solar panels has managed to cool the fridge down close to standard fridge temperatures. The temperature remains steady overnight thanks to added thermal mass in the fridge. My food seems safe in it, despite it being powered off for 14 hours each night.

graph of fridge temperature, starting at 13C and trending downwards to 5C over 24 hours

(Numbers in this graph are running higher than the actual temps of food in the fridge, for reasons explained in the scuttlebutt thread.)

Of course, the longterm viability of a fridge that never draws from a battery is TBD; I'll know within a year if it works for me.

bunch of bananas resting on top of chest freezer fridge conversion

I've written about the coding side of this project before, in my haskell controlled offgrid fridge. The reactive-banana-automation library is working well in this application. My AIMS inverter control board and easy-peasy-devicetree-squeezy were other groundwork for this project.

more fun with reactive-banana-automation

My house knows when people are using the wifi, and keeps the inverter on so that the satellite internet is powered up, unless the battery is too low. When nobody is using the wifi, the inverter turns off, except when it's needed to power the fridge.

Sounds a little complicated, doesn't it? The code to automate that using my reactive-banana-automation library is almost shorter than the English description, and certianly clearer.

inverterPowerChange :: Sensors t -> MomentAutomation (Behavior (Maybe PowerChange))
inverterPowerChange sensors = do
    lowpower <- lowpowerMode sensors
    fridgepowerchange <- fridgePowerChange sensors
    wifiusers <- numWifiUsers sensors
    return $ react <$> lowpower <*> fridgepowerchange <*> wifiusers
where
    react lowpower fridgepowerchange wifiusers
            | lowpower = Just PowerOff
            | wifiusers > 0 = Just PowerOn
            | otherwise = fridgepowerchange

Of course, there are complexities under the hood, like where does numWifiUsers come from? (It uses inotify to detect changes to the DHCP leases file, and tracks when leases expire.) I'm up to 1200 lines of custom code for my house, only 170 lines of which are control code like the above.

But that code is the core, and it's where most of the bugs would be. The goal is to avoid most of the bugs by using FRP and Haskell the way I have, and the rest by testing.

For testing, I'm using doctest to embed test cases along with the FRP code. I designed reactive-banana-automation to work well with this style of testing. For example, here's how it determines when the house needs to be in low power mode, including the tests:

-- | Start by assuming we're not in low power mode, to avoid
-- entering it before batteryVoltage is available.
--
-- If batteryVoltage continues to be unavailable, enter low power mode for
-- safety.
-- 
-- >>> runner <- observeAutomation (runInverterUnless lowpowerMode) (mkSensors (pure ()))
-- >>> runner $ \sensors -> gotEvent (dhcpClients sensors) []
-- []
-- >>> runner $ \sensors -> sensorUnavailable (batteryVoltage sensors)
-- [InverterPower PowerOff]
-- >>> runner $ \sensors -> batteryVoltage sensors =: Volts 25
-- [InverterPower PowerOn]
-- >>> runner $ \sensors -> batteryVoltage sensors =: Volts 20
-- [InverterPower PowerOff]
-- >>> runner $ \sensors -> batteryVoltage sensors =: Volts 25
-- [InverterPower PowerOn]
-- >>> runner $ \sensors -> sensorUnavailable (batteryVoltage sensors)
-- [InverterPower PowerOff]
lowpowerMode :: Sensors t -> MomentAutomation (Behavior Bool)
lowpowerMode sensors = automationStepper False
    =<< fmap calc <$> getEventFrom (batteryVoltage sensors)
  where
    -- Below 24.0 (really 23.5 or so) is danger zone for lead acid.
    calc (Sensed v) = v < Volts 24.1
    calc SensorUnavailable = True

The sensor data is available over http, so I can run this controller code in test mode, on my laptop, and observe how it reacts to real-world circumstances.

joey@darkstar:~/src/homepower>./controller test
InverterPower PowerOn
FridgeRelay PowerOff

Previously: my haskell controlled offgrid fridge

Posted
my haskell controlled offgrid fridge

I'm preparing for a fridge upgrade, away from the tiny propane fridge to a chest freezer conversion. My home computer will be monitoring the fridge temperature and the state of my offgrid energy system, and turning the fridge on and off using a relay and the inverter control board I built earlier.

This kind of automation is a perfect fit for Functional Reactive Programming (FRP) since it's all about time-varying behaviors and events being combined together.

Of course, I want the control code to be as robust as possible, well tested, and easy to modify without making mistakes. Pure functional Haskell code.

There are many Haskell libraries for FRP, and I have not looked at most of them in any detail. I settled on reactive-banana because it has a good reputation and amazing testimonials.

"In the programming-language world, one rule of survival is simple: dance or die. This library makes dancing easy." – Simon Banana Jones

But, it's mostly used for GUI programming, or maybe some musical live-coding. There were no libraries for using reactive-banana for the more staid task of home automation, or anything like that. Also, using it involves a whole lot of IO code, so not great for testing.

So I built reactive-banana-automation on top of it to address my needs. I think it's a pretty good library, although I don't have a deep enough grokking of FRP to say that for sure.

Anyway, it's plenty flexible for my fridge automation needs, and I also wrote a motion-controlled light automation with it to make sure it could be used for something else (and to partly tackle the problem of using real-world time events when the underlying FRP library uses its own notion of time).

The code for my fridge is a work in progress since the fridge has not arrived yet, and because the question of in which situations an offgrid fridge should optimally run and not run is really rather complicated.

Here's a simpler example, for a non-offgrid fridge.

fridge :: Automation Sensors Actuators
fridge sensors actuators = do
        -- Create a Behavior that reflects the most recently reported
        -- temperature of the fridge.
        btemperature <- sensedBehavior (fridgeTemperature sensors)
        -- Calculate when the fridge should turn on and off.
        let bpowerchange = calcpowerchange <$> btemperature
        onBehaviorChangeMaybe bpowerchange (actuators . FridgePower)
  where
        calcpowerchange (Sensed temp)
                | temp `belowRange` allowedtemp = Just PowerOff
                | temp `aboveRange` allowedtemp = Just PowerOn
                | otherwise = Nothing
        calcpowerchange SensorUnavailable = Nothing
        allowedtemp = Range 1 4

And here the code is being tested in a reproducible fashion:

> runner <- observeAutomation fridge mkSensors
> runner $ \sensors -> fridgeTemperature sensors =: 6
[FridgePower PowerOn]
> runner $ \sensors -> fridgeTemperature sensors =: 3
[]
> runner $ \sensors -> fridgeTemperature sensors =: 0.5
[FridgePower PowerOff]

BTW, building a 400 line library and writing reams of control code for a fridge that has not been installed yet is what we Haskell programmers call "laziness".

Posted
AIMS inverter control via GPIO ports

I recently upgraded my inverter to a AIMS 1500 watt pure sine inverter (PWRI150024S). This is a decent inverter for the price, I hope. It seems reasonably efficient under load compared to other inverters. But when it's fully idle, it still consumes 4 watts of power.

That's almost as much power as my laptop, and while 96 watt-hours per day may not sound like a lot of power, some days in winter, 100 watt-hours is my entire budget for the day. Adding more batteries just to power an idle inverter would be the normal solution, probably. Instead, I want to have my house computer turn it off when it's not being used.

Which comes to the other problem with this inverter, since the power control is not a throw switch, but a button you have to press and hold for a second. And looking inside the inverter, this was not easily hacked to add a relay to control it.

The inverter has a RJ22 control port. AIMS also does not seem to document what the pins do, so I reverse engineered them.

Since the power is toggled, it's important that the computer be able to check if the inverter is currently running, to reliably get to the desired on/off state.

I designed (well, mostly cargo-culted) a circuit that uses 4n35 optoisolators to safely interface the AIMS with my cubietruck's GPIO ports, letting it turn the inverter on and off, and also check if it's currently running. Built this board, which is the first PCB I've designed and built myself.

The full schematic and haskell code to control the inverter are in the git repository https://git.joeyh.name/index.cgi/joey/homepower.git/tree/. My design notebook for this build is available in secure scuttlebutt along with power consumption measurements.

It works!

joey@darkstar:~>ssh house inverter status
off
joey@darkstar:~>ssh house inverter on
joey@darkstar:~>ssh house inverter status
on
three conferences one week

Thought I'd pack my entire year's conference schedule into one week...

First was a Neuroinformatics infrastructure interoperability workshop at McGill, my second trip to Montreal this year. Well outside my wheelhouse, but there's a fair amount of interest in that community in git-annex/datalad. This was a roll with the acronyms, and try to draw parallels to things I know affair. Also excellent sushi and a bonus Secure Scuttlebutt meetup.

Then LibrePlanet. A unique and super special conference, that utterly flew by this year. This is my sixth LibrePlanet and I enjoy it more each time. Hghlights for me were Bassam's photogrammetry workshop, Karen receiving the Free Software award, and Seth's thought-provoking talk on "incompossibilities" especially as applied to social networks. And some epic dinner conversations in central square.

Finally today, a one-day local(!) functional programming(!!) conference in Knoxville TN. Lambda Squared was the best constructed single-track conference I've seen. Starting with an ex-pro-figure skater getting the whole audience to pirouette to capture that uncomfortable out of your element feeling you get learning FP, and ramping gradually past "functional javascript" to orthagonality, contravariant functors, the lambda cube, and constructivist logic.

I notice that I've spent a lot more time in Boston than I ever have in Knoxville -- Cambridge MA is starting to feel like my old haunts, though I've never really lived there. There are not a lot of functional programming conferences in the southeastern USA, and I think this explains how Lambda Squared attracted such a good lineup of speakers. Also Knoxville has a surprisingly large and lively FP community shaping up. There will be another Lambda Squared next year, and this might be a good opportunity to visit with me and go to a FP conference too.

And now time to retreat into my retreaty place for a good long while.

Posted
prove you are not an Evil corporate person

In which Google be Google and I drop a hot AGPL tip.

recaptcha.png

Google Is Quietly Providing AI Technology for Drone Strike Targeting Project
Google Is Helping the Pentagon Build AI for Drones

to automate the identification and classification of images taken by drones — cars, buildings, people — providing analysts with increased ability to make informed decisions on the battlefield

These news reports don't mention reCaptcha explicitly, but it's been asking about a lot of cars lately. Whatever the source of the data that Google is using for this, it's disgusting that they're mining it from us without our knowledge or consent.

Google claims that "The technology flags images for human review, and is for non-offensive uses only". So, if a drone operator has a neural network that we all were tricked & coerced into training to identify cars and people helping to highlight them on their screen and center the crosshairs just right, and the neural network is not pressing the kill switch, is it being used for "non-offensive purposes only"?


Google is known to be deathly allergic to the AGPL license. Not only on servers; they don't even allow employees to use AGPL software on workstations. If you write free software, and you'd prefer that Google not use it, a good way to ensure that is to license it under the AGPL.

I normally try to respect the privacy of users of my software, and of personal conversations. But at this point, I feel that Google's behavior has mostly obviated those moral obligations. So...

Now seems like a good time to mention that I have been contacted by multiple people at Google about several of my AGPL licensed projects (git-annex and either keysafe or debug-me I can't remember which) trying to get me to switch them to the GPL, and had long conversations with them about it.

Google has some legal advice that the AGPL source provision triggers much more often than it's commonly understood to. I encouraged them to make that legal reasoning public, so the community could address/debunk it, but I don't think they have. I won't go into details about it here, other than it seemed pretty bonkers.

Mixing in some AGPL code with an otherwise GPL codebase also seems sufficient to trigger Google's allergy. In the case of git-annex, it's possible to build all releases (until next month's) with a flag that prevents linking with any AGPL code, which should mean the resulting binary is GPL licensed, but Google still didn't feel able to use it, since the git-annex source tree includes AGPL files.

I don't know if Google's allergy to the AGPL extends to software used for drone murder applications, but in any case I look forward to preventing Google from using more of my software in the future.


(Illustration by scatter//gather)

futures of distributions

Seems Debian is talking about why they are unable to package whole categories of modern software, such as anything using npm. It's good they're having a conversation about that, and I want to give a broader perspective.

Lars Wirzenius's blog post about it explains the problem well from the Debian perspective. In short: The granularity at which software is built has fundamentally changed. It's now typical for hundreds of small libraries to be used by any application, often pegged to specific versions. Language-specific tools manage all the resulting complexity automatically, but distributions can't muster the manpower to package a fraction of this stuff.

Lars lists some ideas for incremental improvements, but the space within which a Linux distribution exists has changed, and that calls not for incremental changes, but for a fundamental rethink from the ground up. Whether Debian is capable of making such fundamental changes at this point in its lifecycle is up to its developers to decide.

Perhaps other distributions are dealing with the problem better? One way to evaluate this is to look at how a given programming language community feels about a distribution's handling of their libraries. Do they generally see the distribution as a road block that must be worked around, or is the distribution a useful part of their workflow? Do they want their stuff included in the distribution, or does that seem like a lot of pointless bother?

I can only speak about the Haskell community. While there are some exceptions, it generally is not interested in Debian containing Haskell packages, and indeed system-wide installations of Haskell packages can be an active problem for development. This is despite Debian having done a much better job at packaging a lot of Haskell libraries than it has at say, npm libraries. Debian still only packages one version of anything, and there is lag and complex process involved, and so friction with the Haskell community.

On the other hand, there is a distribution that the Haskell community broadly does like, and that's Nix. A subset of the Haskell community uses Nix to manage and deploy Haskell software, and there's generally a good impression of it. Nix seems to be doing something right, that Debian is not doing.

It seems that Nix also has pretty good support for working with npm packages, including ingesting a whole dependency chain into the package manager with a single command, and thousands of npm libraries included in the distribution I don't know how the npm community feels about Nix, but my guess is they like it better than Debian.

Nix is a radical rethink of the distribution model. And it's jettisoned a lot of things that Debian does, like manually packaging software, or extreme license vetting. It's interesting that Guix, which uses the same technologies as Nix, but seems in many ways more Debian-like with its care about licensing etc, has also been unable to manage npm packaging. This suggests to me that at least some of the things that Nix has jettisoned need to be jettisoned in order to succeed in the new distribution space.

But. Nix is not really exploding in popularity from what I can see. It seems to have settled into a niche of its own, and is perhaps expanding here and there, but not rapidly. It's insignificant compared with things like Docker, that also radically rethink the distribution model.

We could easily end up with some nightmare of lithification, as described by Robert "r0ml" Lefkowitz in his Linux.conf.au talk. Endlessly copied and compacted layers of code, contained or in the cloud. Programmer-archeologists right out of a Vinge SF novel.

r0ml suggests that we assume that's where things are going (or indeed where they already are outside little hermetic worlds like Debian), and focus on solving technical problems, like deployment of modifications of cloud apps, that prevent users from exercising software freedoms.

In a way, r0ml's ideas are what led me to thinking about extending Scuttlebutt with Annah, and indeed if you squint at that right, it's an idea for a radically different kind of distribution.

Well, that's all I have. No answers of course.

easy-peasy-devicetree-squeezy

I've created a new program, with a silly name, that solves a silly problem with devicetree overlays. Seem that, alhough there's patches to fully support overlays, including loading them on the fly into a running system, it's not in the mainline kernel, and nobody seems to know if/when it will get mainlined.

So easy-peasy-devicetree-squeezy is a hack to make it easy to do device tree overlay type things already. This program makes it easy peasy to squeeze together the devicetree for your board with whatever additions you need. It's pre-deprecated on release; as soon as device tree overlay support lands, there will be no further need for it, probably.

It doesn't actually use overlays, instead it arranges to include the kernel's devicetree file for your board together with whatever additions you need. The only real downside of this approach is that the kernel source tarball is needed. Benefits include being able to refer to any labels you need from the kernel's devicetree files, and being able to #include and use symbols like GPIO_ACTIVE_HIGH from the kernel headers.

It supports integrating into a Debian system so that the devicetree will be updated, with your additions, whenever the kernel is upgraded.

Source is in a git repository at https://git.joeyh.name/index.cgi/easy-peasy-devicetree-squeezy.git/
See the README for details.

If someone wants to package this up and include it in Debian, it's a simple shell script, so it should take about 10 minutes.

example use

Earlier I wrote about cubietruck temperature sensor setup, and the difficulty I had with modifying the device tree for that. With easy-peasy-devicetree-squeezy, I only have to create a file /etc/easy-peasy-devicetree-squeezy/my.dts that contains this:

    /* Device tree addition enabling onewire sensors
     * on CubieTruck GPIO pin PG8 */
    #include <dt-bindings/gpio/gpio.h>

    / {
            onewire_device {
                    compatible = "w1-gpio";
                    gpios = <&pio 6 8 GPIO_ACTIVE_HIGH>; /* PG8 */
                    pinctrl-names = "default";
                    pinctrl-0 = <&my_w1_pin>;
            };
    };

    &pio {
            my_w1_pin: my_w1_pin@0 {
                    allwinner,pins = "PG8";
                    allwinner,function = "gpio_in";
            };
    };

Then run "sudo easy-peasy-devicetree-squeezy --debian sun7i-a20-cubietruck"

Today's work was sponsored by Trenton Cronholm on Patreon.

Posted