Skip to content
/ veintidos Public

Prototype for a deduplicating Ceph client library

Notifications You must be signed in to change notification settings

irq0/veintidos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Veintidos

Prototype for content-addressed self-deduplicating storage using Ceph. Similar to Plan 9 Venti, but on top of RADOS.

Idea

The core idea is to content address Ceph objects. Usually Ceph object names are arbitrarily chosen by a client application. In content addressed storage (CAS) this object name is determined by fingerprinting the content.

Therefore, writing the same data results in the same fingerprint. The same fingerprint means that we access the same object, since we use the fingerprint as its name.

To keep track about how many clients refer to a single object, an object stores a reference counter, that is incremented every time the same data is written by a client.

The reference count is also used to determine when an object is not used anymore.

With this approach in Ceph we get storage pools that deduplicate data.

The drawback of this approach is, that extra metadata is necessary to map client file names to CAS objects. The extra metadata is stored in index objects that use a filename as the object name just as regular Ceph objects. They contain an object map with a map of version numbers and references to recipe objects.

Recipe objects are stored like CAS objects, so that storing the same file twice will also deduplicate the recipe object.

A recipe object stores an extent-like list that maps file regions to fingerprints of CAS objects.

Example

Situation: A client stores a backup tarball of several gigabytes using Veintidos.

First Veintidos uses a chunker to split the tarball. The chunker generates a list of chunk extents: (offset, size, data chunk).

The data chunk is written as a CAS object and the chunk extent is changed to (offset, size, fingerprint).

All of this tuples are recorded and after all CAS writes finish converted into a recipe.

This recipe is then written as another CAS object and its fingerprint together with a version number stored in an index object that takes the name of the original tarball.

Implementation

The system consists of a Ceph Object Class (github), that implements reference counting and metadata storage for CAS objects, and a client library that adds chunking, recipes, and compression.

Source code documentation is available on Github Pages: https://irq0.github.io/veintidos/

From bottom-up the Veintidos stack looks like this:

API Outline

  • cas.put(data) -> fingerprint (refcount++)
  • cas.get(fingerprint) -> data
  • cas.up(fingerprint) -> (refcount++)
  • cas.down(fingerprint) -> (refcount–)
  • cas.list() -> [[fingerprint, refcount], …]
  • cas.info() -> metadata
  • chunker.write_full(name, data) -> version (cas.put(data), cas.put(recipe))
  • chunker.read_full(name) -> data (cas.get(recipe), cas.get(data))
  • chunker.read(name, off, size) -> data
  • chunker.versions(name) -> [HEAD, …] chunker.remove_version(name, version)
  • chunker.remove_version(name, version)

CAS Pool

./_img/veintidos_namespaces.png

Veintidos uses a single Ceph pool, but two distinct namespaces to avoid name collisions between content-defined names and user-defined names.

  1. The CAS namespace for data chunks and recipe objects. Names are content defined
  2. INDEX namespace for index objects that map filenames to recipe objects. Names are user defined

CAS Objects

  • Store arbitrary data
  • Created using cas.put and named after the fingerprint of their content
  • Reference counted by the CAS Ceph object class
  • Store additional metadata for the compression and fingerprinting algorithm

Recipe Objects

./_img/veintidos_recipe.png

  • Store an encoded list of extents
  • Each extent has the form (offset, length, fingerprint)
  • Stored as CAS objects
  • Written after all chunks are successfully written
  • Do not store the name of the original file

Index Objects

./_img/veintidos_index_object.png

  • Associate recipe objects with a filename and version
  • The version is a UNIX timestamp
  • Store version -> fingerprint mapping as an object map

Limitations

Since Veintidos is a prototype the implementation has some limitations:

  • Client-side code is written in Python
  • CAS PUT uses JSON
    • Data is Base64 encoded; Metadata is taken as is by the object class
    • A rewrite in C++ could leverage the Ceph encode/decoder, which is unavailable in Python
  • CAS GET is implemented using regular RADOS operations
    • Can’t use Cephx to limit operations to CAS object class
  • CAS Chunker has no partial write support
    • Can’t partially update files
  • No structure sharing between recipes of the same file
  • Only static chunking
    • Add dynamic chunker to increase dedup ratio
  • Recipes are implemented client side and not in an object class
    • Possibly move get_extents_in_range to OSD to speed up operations such as partial read/write
  • Client crashes result in orphaned objects
    • If a client crashes before writing the recipe and index objects the data objects in the CAS pool end up being unreferenced
    • Fix: Add intent logging to client

Usage

The code contains both a library and a command line utility to write files to a CAS pool.

CLI

veintidos.py put "backup" <(tar cvf - /)
veintidos.py get "backup" root_backup.tar

Library

Ventidos has two layers:

CAS

Thin layer over the RADOS / CAS Object class. Provides methods to put, get and increment / decrement the reference counter of objects

Chunk

Adds chunking and recipes on top of CAS.

Dependencies

  • Ceph Cluster with CAS object class installed. Not part of mainline Ceph. Branch: github
  • Python 2.7
  • Python RADOS bindings with execute support
  • msgpack
  • python-snappy
  • nose for the unittests

About

Prototype for a deduplicating Ceph client library

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published