Hammerspace - Data de-duplication for the Tux 3 Linux file system

Team

  • Chinmay Kamat
  • Gaurav Tungatkar
  • Kushal Dalmia
  • Amey Magar

Mentor

Amey Inamdar

Year: 2008-2009

Synopsis

The increasing amount of data stored and backed up in data centres is a major concern. A keen observation is that most backup jobs only hold a small percentage of really new data— typically less than five percent. The rest is a duplicate of data that has remained unmodified from the previous backup. The elimination of this duplicate data promises to reduce storage needs and improve data restore times considerably. This project presents a solution to eliminate such duplicate data using "Data De-duplication".

The de-duplication is performed inline and at block granularity. We use the Tux 3 file system for the prototype implementation. Tux3 is a write-anywhere, atomic commit, btree based versioning file system being developed by Daniel Phillips. It aims to provide efficient snapshoting and replication method with main usage in Networked Attached Storage. Tux3 is the latest file system which shows great promise in making it to the Linux Kernel.

The design includes a btree based lookup layer on top of a Bucket data structure. The Locality Based Bucket Layout and Fingerprint Index enable fast and efficient detection and elimination of duplicate data blocks. The design is integrated into the filesystem and does not require any application level intelligence.

Achievements

  • Won the Open Software Project Competition, Concepts 2009 at PICT, Pune.
  • Won the 1st Prize in category System Applications, Concepts 2009 at PICT, Pune
  • Won the 3rd Prize in Pratibha (Paper Presentaion) , Concepts 2009 at PICT, Pune for the paper “Data De-duplication for btree based Linux file systems”.
  • Also won prizes at VIT, BITS Goa and VJTI, Mumbai

Links