duperemove

Finds duplicate filesystem extents and optionally schedule them for deduplication. An extent is small part of a file inside the filesystem. On some filesystems one extent can be referenced multiple times, when parts of the content of the files are identical. More information: <https://markfasheh.github.io/duperemove/>.

Install

All systems
curl cmd.cat/duperemove.sh
Debian Debian
apt-get install duperemove
Ubuntu
apt-get install duperemove
image/svg+xml Kali Linux
apt-get install duperemove
Fedora
dnf install duperemove
Windows (WSL2)
sudo apt-get update sudo apt-get install duperemove

Finds duplicate filesystem extents and optionally schedule them for deduplication. An extent is small part of a file inside the filesystem. On some filesystems one extent can be referenced multiple times, when parts of the content of the files are identical. More information: <https://markfasheh.github.io/duperemove/>.

  • Search for duplicate extents in a directory and show them:
    duperemove -r path/to/directory
  • Deduplicate duplicate extents on a Btrfs or XFS (experimental) filesystem:
    duperemove -r -d path/to/directory
  • Use a hash file to store extent hashes (less memory usage and can be reused on subsequent runs):
    duperemove -r -d --hashfile=path/to/hashfile path/to/directory
  • Limit I/O threads (for hashing and dedupe stage) and CPU threads (for duplicate extent finding stage):
    duperemove -r -d --hashfile=path/to/hashfile --io-threads=N --cpu-threads=N path/to/directory

© tl;dr; authors and contributors