Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

I sent a PR to add support for the necessary syscall (FIDUPERANGE) to zfs that i just have to clean up again.

Once that is in, any of the existing dupe finding tools that use it (IE jdupes, duperemove) will just work on ZFS.



Knowing what you had to know to write that, would you dare using it?

Compression, encryption and streaming sparse files together are impressive already. But now we get a new BRT entry appearing out of nowhere, dedup index pruning one that was there a moment ago, all while correctly handling arbitrary errors in whatever simultaneous deduped writes, O_DIRECT writes, FALLOC_FL_PUNCH_HOLE and reads were waiting for the same range? Sounds like adding six new places to hold the wrong lock to me.


"Knowing what you had to know to write that, would you dare using it?"

It's no worse than anything else related to block cloning :)

ZFS already supports FICLONERANGE, the thing FIDEDUPRANGE changes is that the compare is part of the atomic guarantee.

So in fact, i'd argue it's actually better than what is there now - yes, the hardest part is the locking, but the locking is handled by the dedup range call getting the right locks upfront, and passing them along, so nothing else is grabbing the wrong locks. It actually has to because of the requirements to implement the ioctl properly. We have to be able to read both ranges, compare them, and clone them, all as an atomic operation wrt to concurrent writes. So instead of random things grabbing random locks, we pass the right locks around and everything verifies the locks.

This means fideduprange is not as fast as it maybe could be, but it does not run into the "oops we forgot the right kind of lock" issue. At worst, it would deadlock, because it's holding exclusive locks on all that it could need before it starts to do anything in order to guarantee both the compare and the clone are atomic. So something trying to grab a lock forever under it will just deadlock.

This seemed the safest course of implementation.

ficlonerange is only atomic in the cloning, which means it does not have to read anything first, it can just do blind block cloning. So it actually has a more complex (but theoretically faster) lock structure because of the relaxed constraints.


Note - anyone bored enough could already make any of these tools work by using FICLONERANGE (which ZFS already supports), but you'd have to do locking - lock, compare file ranges, clone, unlock.

Because FIDEDUPRANGE has the compare as part of the atomic guarantee, you don't need to lock in userspace around using it, and so no dedup utility bothers to do FICLONERANGE + locking. Also, ZFS is the only FS that implements FICLONERANGE but not FIDEDUPRANGE :)


Shouldn't jdupes like tools already work now that ZFS has reflink copy support?


No, because none of these tools use copy_file_range. Because copy_file_range doesn't guarantee deduplication or anything. It is meant to copy data. So you could just end up copying data, when you aren't even trying to copy anything at all.

All modern tools use FIDEDUPRANGE, which is an ioctl meant for explicitly this use case - telling the FS that two files have bytes that should be shared.

Under the covers, the FS does block cloning or whatever to make it happen.

Nothing is copied.

ZFS does support FICLONERANGE, which is the same as FIDEDUPRANGE but it does not verify the contents are the same prior to cloning.

Both are atomic WRT to concurrent writes, but for FIDEDUPRANGE that means the compare is part of the atomicness. So you don't have to do any locking.

If you used FICLONERANGE, you'd need to lock the two file ranges, verify, clone, unlock

FIDEDUPRANGE does this for you.

So it is possible, with no changes to ZFS, to modify dedup tools to work on ZFS by changing them to use FICLONERANGE + locking if FIDEDUPRANGE does not exist.


Oh cool! Does this work on the block level or only the file level?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: