[btrfs] is vulnerable to a hash-DoS attack

dkhenry · on Dec 13, 2012

This is a misleading title. The limits of this attack make it so that your better off fork bombing the system then trying to "hack" it with this kind of attack. The two attacks specified are making it impossible to create a specifically named file in a shared directory, and making deletes take a long time. Neither of these is a real DoS attack and Chris appears to be taking the reasonable approach of acknowledging that it can be made better and scheduling it for the next pull window.

This whole article smells of grandstanding.

darkarmani · on Dec 13, 2012

That's extremely uncharitable. The major difference that you gloss over is that one has tools to stop fork bombs, but there are no tools to stop this kind of attack.

This attack can easily be used to disrupt systems. I can imagine every naive implementation of file upload out there being vulnerable to this. When the system goes to delete old uploaded files, it halts.

I'm not sure how "making deletes take a long time" is not a DoS attack.

PaulHoule · on Dec 13, 2012

I don't like innovation in filesystems. ext4 might be boring, but ext4 wrecks don't make the evening news the way wrecks with ZFS do. I worked on a system that used reiserfs in production and we were always dealing with problems caused by the weak reliability guarantees. For instance when the system crashed we'd find our filesystem was full of files full of trash data (it allocated space for the file but never wrote the data so this is what was left behind.)

Since the system would try to read the trash data, this was a problem.

I was looking at the btrfs documentation the other day and noticed that (1) it behaves worse when space runs out than ext4 does, and (2) there's no accurate way to measure free space on a btrfs volume so it's hard to avoid running out of space.

Whatever benefits btrfs has is erased if you have to deal with wrecked and full filesystems all the time.

dunecn · on Dec 13, 2012

Regarding (1), yes a COW file system will (generally) behave "worse" that a non COW file system when space is limited. But, this is a design choice that makes sense for many use cases. Just to point out a few: * Snapshots are virtually free. * Writes are sequential. * "Harder" to destroy existing data (block are not overwritten in place)

These design decisions are not made light heartedly.

diegocg · on Dec 13, 2012

In this case, Btrfs has not invented anything new. Hashed directories are a standard technique used in all modern filesystems, including Ext3/4 or XFS. It would be interesting if someone tried to get hash collisions in them with modern hardware.

tobias3 · on Dec 13, 2012

The other day it wouldn't even allow me to remove or truncate files, because it was running out of space.

I was annoyed: How would you free space if you can't remove anything. There wasn't any snapshot of those files, but I was able to free space by removing another snapshot (which I wanted to keep, but never mind...).

kzrdude · on Dec 13, 2012

The goal is of course that in the end btrfs will have better guarantees than ext4, for example with data and metadata checksums.

leif · on Dec 13, 2012

Not a new attack. There is a decent history of an arms race in TOCTTOU races in access/open that leads to this type of algorithmic complexity attack against bad hashing. Maybe btrfs does worse things if you do this but it's not alone.

See http://www.cs.stonybrook.edu/~xcai/races2.pdf and some of its references.

CJefferson · on Dec 13, 2012

Doesn't the first paragraph of the article talk about the history of hash collision attacks? Or are you talking about some other way in which this is not a new attack?

Also, looking at his numbers, btrfs is doing something very wrong, over simple hash collisions. A file system which takes over 5 seconds to make 61 files is simply broken (in this situation, not making general comments about btrfs).

leif · on Dec 13, 2012

You're right, I skimmed and that's my fault.

However, it's worth noting that these sorts of attacks can give privilege escalation, not just DoS.

finnw · on Dec 13, 2012

Another job for siphash?

https://131002.net/siphash/

apawloski · on Dec 13, 2012

So it seems like fixing this is as a simple as using a hash algorithm that's more robust to collisions? That's not too bad, am I missing something more severe?

CJefferson · on Dec 13, 2012

The hash issues aren't just causing a slight slowdown, they seem to be bringing the fs to a complete halt. Also, changing hash, unless done with care, will cause compatibility issues.

apawloski · on Dec 13, 2012

CRC32 used to be pretty common -- is there a typical replacement being used instead? Of course usual mantra is "it depends on what you're using it for", but I'm curious if there's a typical drop-in.

>Also, changing hash, unless done with care, will cause compatibility issues.

I'm not calling you out, but I'm interested in what kinds of compatibility issues you're thinking of?

CJefferson · on Dec 13, 2012

I was mainly thinking about the disc image not being mountable by older kernels. File systems usually aim to change as little as possible.

ibotty · on Dec 14, 2012

on lwn, chris mason is responding

https://lwn.net/Articles/529077

sgt · on Dec 13, 2012

I wonder if ZFS is vulnerable to something similar?

cokernel_hacker · on Dec 13, 2012

Sadly, this article gets some stuff about btrfs wrong. Allow me to clarify:

* First off, btrfs does not use "hash tables". Instead, it uses hashes to create keys that index into B-Trees. The problem is that hash collisions are not handled with an efficient approach. btrfs is forced to do a lot of work to deal with these collisions.

* ZFS uses two distinct data structures to represent the contents of a directory: the so-called "micro-ZAP" and the so-called "fat-ZAP"

micro-ZAPs are for small directories. This is OK as the number of collisions is limited by the relatively small size of the micro-ZAP.

fat-ZAPs are like on-disk extensible hash-tables.

They both use CRC64, I think that fat-ZAPs might have a problem.

ajross · on Dec 13, 2012

That's interesting. Is there any guidance as to why the tree keys are derived hashes instead of the actual file names? Obviously the original space is invulnerable to collisions by definition. Was the point simply to make the keys smaller to fit in a single machine word? Is that really helpful for performance vs. an optimized strcmp?

I'm generally a big btrfs fan, but here it seems like they got caught due to a senseless overoptimization...

cokernel_hacker · on Dec 13, 2012

Variable length keys are difficult to implement efficiently. The alternative is not doing file name lookups by key query which would damage performance in unpatholgical cases.

ajross · on Dec 13, 2012

Really? I'm not sure I buy that. B-tree traversal in real world cases is virtually always going to be I/O or memory bandwidth bound. That's just not going to be sensitive to the handful of cycles you save except in the case of tiny directories that are already in L1/L2 cache. But there, you're paying the up-front cost of hashing the input file name as "extra" and it's not even clear to me you'd save anything overall.

Basically, this just smells like a premature optimization to me. If it were my project and lacked the hashing feature, and someone wanted to add it, I'd really want to see some numbers before accepting it.

Someone · on Dec 13, 2012

"B-tree traversal in real world cases is virtually always going to be I/O or memory bandwidth bound."

I am not an expert on file systems, but have you thought about the following:

- using file names in disk blocks rather than file name hashes means fewer entries per disk block. That, in turn, changes the constant of your B-tree traversal.

- with variable-length keys, keeping your B-trees balanced is tricky, if not practically impossible.

EDIT: disadvantage of using hashes would be that you need to read the filename proper (with small hashes, you cannot ignore hash collisions). That would be an extra I/O. So I guess this would not be beneficial for small directories. Maybe you could start out by having in-block hashes amd filenames and only move the names to a separate block when you need a second block?

cokernel_hacker · on Dec 14, 2012

The most common implementation technique that I know of is to include a small constant number (~4) bytes of the actual name next to the hash.

Someone · on Dec 14, 2012

I can see how that would help if your filenames are bin, dev, opt, usr, and var, but in the general case, I do not see how that beats having a longer hash (or a second, independent hash).

Four bytes of the name will have less entropy than such data, and having part of the filename around will not help making sure that a matching hash implies a matching filename.

Can you give explain this or give a name of a filesystem doing this?

rincebrain · on Dec 13, 2012

Possibly technically, but ZFS's hashes IIRC are more nontrivial to collide than CRC32 by default [for the data checksum/metadata checksum, fletcher/sha256/[etc]]. I can't even guess about the pathology of the dedup code. I want to say that the equivalent filename lookups/creation are fundamentally different in ZFS, but I genuinely don't recall how that sausage is made.