PoW bucket bloom: throttling anonymous clients with proof of work, token buckets, and bloom filters

An interesting side problem in keysafe's design is that keysafe servers, which run as tor hidden services, allow anonymous data storage and retrieval. While each object is limited to 64 kb, what's to stop someone from making many requests and using it to store some big files?

The last thing I want is a git-annex keysafe special remote. ;-)

I've done a mash-up of three technologies to solve this, that I think is perhaps somewhat novel. Although it could be entirely old hat, or even entirely broken. (All I know so far is that the code compiles.) It uses proof of work, token buckets, and bloom filters.

Each request can have a proof of work attached to it, which is just a value that, when hashed with a salt, starts with a certain number of 0's. The salt includes the ID of the object being stored or retrieved.

The server maintains a list of token buckets. The first can be accessed without any proof of work, and subsequent ones need progressively more proof of work to be accessed.

Clients will start by making a request without a PoW, and that will often succeed, but when the first token bucket is being drained too fast by other load, the server will reject the request and demand enough proof of work to allow access to the second token bucket. And so on down the line if necessary. At the worst, a client may have to do 8-16 minutes of work to access a keysafe server that is under heavy load, which would not be ideal, but is acceptible for keysafe since it's not run very often.

If the client provides a PoW good enough to allow accessing the last token bucket, the request will be accepted even when that bucket is drained. The client has done plenty of work at this point, so it would be annoying to reject it. To prevent an attacker that is willing to burn CPU from abusing this loophole to flood the server with object stores, the server delays until the last token bucket fills back up.

So far so simple really, but this has a big problem: What prevents a proof of work from being reused? An attacker could generate a single PoW good enough to access all the token buckets, and flood the server with requests using it, and so force everyone else to do excessive amounts of work to use the server.

Guarding against that DOS is where the bloom filters come in. The server generates a random request ID, which has to be included in the PoW salt and sent back by the client along with the PoW. The request ID is added to a bloom filter, which the server can use to check if the client is providing a request ID that it knows about. And a second bloom filter is used to check if a request ID has been used by a client before, which prevents the DOS.

Of course, when dealing with bloom filters, it's important to consider what happens when there's a rare false positive match. This is not a problem with the first bloom filter, because a false positive only lets some made-up request ID be used. A false positive in the second bloom filter will cause the server to reject the client's proof of work. But the server can just request more work, or send a new request ID, and the client will follow along.

The other gotcha with bloom filters is that filling them up too far sets too many bits, and so false positive rates go up. To deal with this, keysafe just keeps count of how many request IDs it has generated, and once it gets to be too many to fit in a bloom filter, it makes a new, empty bloom filter and starts storing request IDs in it. The old bloom filter is still checked too, providing a grace period for old request IDs to be used. Using bloom filters that occupy around 32 mb of RAM, this rotation only has to be done every million requests of so.

But, that rotation opens up another DOS! An attacker could cause lots of request IDs to be generated, and so force the server to rotate its bloom filters too quickly, which would prevent any requests from being accepted. To solve this DOS, just use one more token bucket, to limit the rate that request IDs can be generated, so that the time it would take an attacker to force a bloom filter rotation is long enough that any client will have plenty of time to complete its proof of work.

This sounds complicated, and probably it is, but the implementation only took 333 lines of code. About the same number of lines that it took to implement the entire keysafe HTTP client and server using the amazing servant library.

There are a number of knobs that may need to be tuned to dial it in, including the size of the token buckets, their refill rate, the size of the bloom filters, and the number of argon2 iterations in the proof of work. Servers may eventually need to adjust those on the fly, so that if someone decides it's worth burning large quantities of CPU to abuse keysafe for general data storage, the server throttles down to a rate that will take a very long time to fill up its disk.

This protects against DOS attacks that fill up the keysafe server storage. It does not prevent a determined attacker, who has lots of CPU to burn, from flooding so many requests that legitimate clients are forced to do an expensive proof of work and then time out waiting for the server. But that's an expensive attack to keep running, and the proof of work can be adjusted to make it increasingly expensive.