GitHub - comsi02/murder: Large scale server deploys using BitTorrent and the BitTornado library

Branches Tags
Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
dist		dist
strategy		strategy
Capfile		Capfile
LICENSE		LICENSE
README		README
murder.rb		murder.rb
murder_admin.rb		murder_admin.rb
murder_config.rb		murder_config.rb
Repository files navigation

Murder by Larry Gadea <lg@twitter.com>
Copyright 2010 Twitter Inc.

DESCRIPTION
-----------

Murder is a method of using Bittorrent to distribute files to a large amount 
of servers within a production environment. This allows for scaleable and fast
deploys in environments of hundreds to tens of thousands of servers where
centralized distribution systems wouldn't otherwise function. A "Murder" is
normally used to refer to a flock of crows, which in this case applies to a
bunch of servers doing something.

In order to do a Murder transfer, there are several components required to be 
set up beforehand -- many the result of BitTorrent nature of the system. Murder
is based on BitTornado.

- A torrent tracker. This tracker, started by running the 'murder_tracker.py' 
script, runs a self-contained server on one machine. Although technically this 
is still a centralized system (everyone relying on this tracker), the 
communication between this server and the rest is minimal and normally 
acceptable. To keep things simple tracker-less distribution (DHT) is currently 
not supported. The tracker is actually just a mini-httpd that hosts a 
/announce path which the Bittorrent clients update their state onto.

- A seeder. This is the server which has the files that you'd like to deploy 
onto all other servers. For Twitter, this is the server that did the git diff. 
The files are placed into a directory that a torrent gets created from. Murder 
will tgz up the directory and create a .torrent file (a very small file 
containing basic hash information about the tgz file). This .torrent file lets 
the peers know what they're downloading. The tracker keeps track of which 
.torrent files are currently being distributed. Once a Murder transfer is 
started, the seeder will be the first server many machines go to to get 
pieces. These pieces will then be distributed in a tree-fashion to the rest of 
the network, but without necessarily getting the parts from the seeder.

- Peers. This is the group of servers (hundreds to tens of thousands) which 
will be receiving the files and distributing the pieces amongst themselves. 
Once a peer is done downloading the entire tgz file, it will continue seeding 
for a while to prevent a hotspot effect on the seeder.

MURDER TRANSFER PROCESS
-----------------------

1. Configure the list of servers and general settings in config.rb. (one time)
2. Distribute the Murder files to all your servers: (one time)
    cap murder:distribute_files
3. Start the tracker: (one time)
    cap murder:start_tracker
4. Create a torrent file from a remote directory of files (on seeder):
    cap murder:create_torrent tag="Deploy20100101" files_path="~/files"
5. Start seeding the files:
    cap murder:start_seeding tag="Deploy20100101"
6. Distribute the files to all servers:
    cap murder:peer tag="Deploy20100101" destination_path="/tmp/out"

Once completed, all files will be in /tmp/out/Deploy20091015/ on all servers.

EXAMPLE DEPLOY
--------------

cap murder:distribute_files
cap murder:start_tracker
cap murder:create_torrent tag="Deploy20100101"
  files_path="/usr/local/twitter/production/current"
cap murder:start_seeding tag="Deploy20100101"
time cap murder:peer tag="Deploy20100101"
  destination_path="/usr/local/twitter/releases"

All the files have been transferred to all servers at this point. To clean up
use:

cap murder:stop_seeding
cap murder:stop_tracker

HOW TO LEARN MURDER INCREMENTALLY
---------------------------------

- Skim dist/*.py to see what Murder does behind the scenes
- Experiment with the python scripts
- Read Capfile
- Read murder_config.rb
- Read murder_admin.rb
- Read murder.rb
- Read the reference below
- Experiment with the Capistrano deploy methods
- Read and experiment with the Murder Capistrano strategy

TASK REFERENCE
--------------

distribute_files: 
  SCPs a compressed version of all files from ./dist (the python Bittorrent 
library and custom scripts) to all server. The entire directory is sent, 
regardless of the role of each individual server. The path on the server is 
specified by remote_murder_path and will be cleared prior to transferring
files over.

start_tracker:
  Starts the Bittorrent tracker (essentially a mini-web-server) listening on 
port 8998.
  
stop_tracker:
  If the Bittorrent tracker is running, this will kill the process. Note that 
if it is not running you will receive an error.

create_torrent:
  Compresses the directory specified by the passed-in argument 'files_path' 
and creates a .torrent file identified by the 'tag' argument. Be sure to use 
the same 'tag' value with any following commands. Any .git directories will be 
skipped. Once completed, the .torrent will be downloaded to your local 
/tmp/TAG.tgz.torrent.
  
download_torrent:
  Although not necessary to run, if the file from create_torrent was lost, you 
can redownload it from the seeder using this task. You must specify a valid 
'tag' argument.

start_seeding:
  Will cause the seeder machine to connect to the tracker and start seeding. 
The ip address returned by the 'host' bash command will be announced to the 
tracker. The server will not stop seeding until the stop_seeding task is 
called. You must specify a valid 'tag' argument (which identifies the .torrent 
in /tmp to use)
  
stop_seeding:
  If the seeder is currently seeding, this will kill the process. Note that if 
it is not running, you will receive an error. If a peer was downloading from 
this seed, the peer will find another host to receive any remaining data. You 
must specify a valid 'tag' argument.

stop_all_seeding:
  Identical to stop_seeding, except this will kill all seeding processes. No 
'tag' argument is needed.

peer:
  Instructs all the peer servers to connect to the tracker and start download 
and spreading pieces and files amongst themselves. You must specify a valid 
'tag' argument. Once the download is complete on a server, that server will 
fork the download process and seed for 30 seconds while returning control to 
Capistrano. Cap will then extract the files to the passed in 
'destination_path' argument to destination_path/TAG/*. To not create this tag 
named directory, pass in the 'no_tag_directory=1' argument. If the directory 
is empty, this command will fail. To clean it, pass in the 
'unsafe_please_delete=1' argument. The compressed tgz in /tmp is never 
removed. When this task completes, all files have been transferred and moved 
into the requested directory.

stop_all_peering:
  Sometimes peers can go on forever (usually because of an error). This 
command will forcibly kill all "murder_client.py peer" commands that are 
running.

CONFIG REFERENCE:

user:
  Provided by Capistrano to specify if to use a different username when sshing
into machines.
  
host_suffix:
  For the tracker_host, seeder_host and peers servers, this suffix will always 
be added to the end when trying to connect to them.
  
default_tag:
  A tag name to use by default such that a tag parameter doesn't need to be 
manually entered on every task. Not recommended to be used since files will be 
overwritten.

default_seeder_files_path:
  A path on the seeder's file system where the files to be distributed are 
stored.

default_destination_path:
  A path on the peers' file system where the files that were distributed 
should be decompressed into.

tracker_host:
  The hostname of a single tracker server. If you're using a host_suffix, do 
not specify a suffix here.
  
tracker_port:
  This is the port on which the mini-web-server tracker that the Bittorrent 
libraries run is hosted on. Must be 8998 for now.

seeder_host:
  The hostname of a single seeder server. This server is the one that has all 
the files that the peers want. If you're using a host_suffix, do not specify a 
suffix here.

peers:
  A list of peers which will be receiving the files and distributing them 
amongst themselves. If you're using a host_suffix, do not specify a suffix 
here.

CAPISTRANO COPY STRATEGY
------------------------

In addition to being usable from both the commandline and Capistrano commands,
an optional Capistrano copy strategy is included to help easily retrofit
existing deploy environments you might already have.

Requirements to use build.rb
  - Add a require line for build.rb in your Capfile. This will override the
    default build strategy. There are some other parameters you must now
    specify though.
  - To enable it, add the following to your Capfile:
    load 'deploy'
    set :strategy, Capistrano::Deploy::Strategy::Build.new(self)
  - Add "set :murder, true" to your Capfile
  - Make sure your release_name is correct (should be already)
  - If you're distribution is a file instead of a directory,
    Add "distribution_is_a_file" to your Capfile
  - Add the following to your Capfile:
    set :build_task do
      "true"
    end
    set :copy_compression, :gz
    set :package_name do
      "#{release_name}-#{real_revision[0, 8]}.tar.#{copy_compression}"
    end
    set :branch, variables[:branch] || 'deploy'    
     
The requirements are a bit hefty for now, though hopefully this will get easier
in the future.