Veintidos

Prototype for content-addressed self-deduplicating storage using Ceph. Similar to Plan 9 Venti, but on top of RADOS.

Idea

The core idea is to content address Ceph objects. Usually Ceph object names are arbitrarily chosen by a client application. In content addressed storage (CAS) this object name is determined by fingerprinting the content.

Therefore, writing the same data results in the same fingerprint. The same fingerprint means that we access the same object, since we use the fingerprint as its name.

To keep track about how many clients refer to a single object, an object stores a reference counter, that is incremented every time the same data is written by a client.

The reference count is also used to determine when an object is not used anymore.

With this approach in Ceph we get storage pools that deduplicate data.

The drawback of this approach is, that extra metadata is necessary to map client file names to CAS objects. The extra metadata is stored in index objects that use a filename as the object name just as regular Ceph objects. They contain an object map with a map of version numbers and references to recipe objects.

Recipe objects are stored like CAS objects, so that storing the same file twice will also deduplicate the recipe object.

A recipe object stores an extent-like list that maps file regions to fingerprints of CAS objects.

Example

Situation: A client stores a backup tarball of several gigabytes using Veintidos.

First Veintidos uses a chunker to split the tarball. The chunker generates a list of chunk extents: (offset, size, data chunk).

The data chunk is written as a CAS object and the chunk extent is changed to (offset, size, fingerprint).

All of this tuples are recorded and after all CAS writes finish converted into a recipe.

This recipe is then written as another CAS object and its fingerprint together with a version number stored in an index object that takes the name of the original tarball.

Implementation

The system consists of a Ceph Object Class (github), that implements reference counting and metadata storage for CAS objects, and a client library that adds chunking, recipes, and compression.

Source code documentation is available on Github Pages: https://irq0.github.io/veintidos/

From bottom-up the Veintidos stack looks like this:

Ceph Object Class - New CAS RADOS Operations (Run on OSD)
RADOS, librados, Python bindings
cas.py - Python interface to CAS object class
- compressor.py - Compression for CAS class
chunk.py - Chunker abstraction (write_full, read_full) and static chunker
- fingerprint.py - Fingerprinting functions
- recipe.py - Read / write recipe objects
veintidos.py - CLI

API Outline

cas.put(data) -> fingerprint (refcount++)
cas.get(fingerprint) -> data
cas.up(fingerprint) -> (refcount++)
cas.down(fingerprint) -> (refcount–)
cas.list() -> [[fingerprint, refcount], …]
cas.info() -> metadata
chunker.write_full(name, data) -> version (cas.put(data), cas.put(recipe))
chunker.read_full(name) -> data (cas.get(recipe), cas.get(data))
chunker.read(name, off, size) -> data
chunker.versions(name) -> [HEAD, …] chunker.remove_version(name, version)
chunker.remove_version(name, version)

CAS Pool

Veintidos uses a single Ceph pool, but two distinct namespaces to avoid name collisions between content-defined names and user-defined names.

The CAS namespace for data chunks and recipe objects. Names are content defined
INDEX namespace for index objects that map filenames to recipe objects. Names are user defined

CAS Objects

Store arbitrary data
Created using cas.put and named after the fingerprint of their content
Reference counted by the CAS Ceph object class
Store additional metadata for the compression and fingerprinting algorithm

Recipe Objects

Store an encoded list of extents
Each extent has the form (offset, length, fingerprint)
Stored as CAS objects
Written after all chunks are successfully written
Do not store the name of the original file

Index Objects

Associate recipe objects with a filename and version
The version is a UNIX timestamp
Store version -> fingerprint mapping as an object map

Limitations

Since Veintidos is a prototype the implementation has some limitations:

Client-side code is written in Python
CAS PUT uses JSON
- Data is Base64 encoded; Metadata is taken as is by the object class
- A rewrite in C++ could leverage the Ceph encode/decoder, which is unavailable in Python
CAS GET is implemented using regular RADOS operations
- Can’t use Cephx to limit operations to CAS object class
CAS Chunker has no partial write support
- Can’t partially update files
No structure sharing between recipes of the same file
Only static chunking
- Add dynamic chunker to increase dedup ratio
Recipes are implemented client side and not in an object class
- Possibly move get_extents_in_range to OSD to speed up operations such as partial read/write
Client crashes result in orphaned objects
- If a client crashes before writing the recipe and index objects the data objects in the CAS pool end up being unreferenced
- Fix: Add intent logging to client

Usage

The code contains both a library and a command line utility to write files to a CAS pool.

CLI

veintidos.py put "backup" <(tar cvf - /)
veintidos.py get "backup" root_backup.tar

Library

Ventidos has two layers:

CAS

Thin layer over the RADOS / CAS Object class. Provides methods to put, get and increment / decrement the reference counter of objects

Chunk

Adds chunking and recipes on top of CAS.

Dependencies

Ceph Cluster with CAS object class installed. Not part of mainline Ceph. Branch: github
Python 2.7
Python RADOS bindings with execute support
msgpack
python-snappy
nose for the unittests

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
_img		_img
test		test
util		util
veintidos		veintidos
.flake8		.flake8
README.org		README.org
requirements.txt		requirements.txt
run_tests		run_tests
veintidos.py		veintidos.py

irq0/veintidos

Folders and files

Latest commit

History

Repository files navigation