def thread(): """Thread spawned by gevent""" if not hasattr(ctx, 'manager'): first_mon = teuthology.get_first_mon(ctx, config) (mon,) = ctx.cluster.only(first_mon).remotes.iterkeys() ctx.manager = CephManager( mon, ctx=ctx, logger=log.getChild('ceph_manager'), ) clients = ['client.{id}'.format(id=id_) for id_ in teuthology.all_roles_of_type(ctx.cluster, 'client')] log.info('clients are %s' % clients) if config.get('ec_pool', False): erasure_code_profile = config.get('erasure_code_profile', {}) erasure_code_profile_name = erasure_code_profile.get('name', False) ctx.manager.create_erasure_code_profile(erasure_code_profile_name, **erasure_code_profile) else: erasure_code_profile_name = False for i in range(int(config.get('runs', '1'))): log.info("starting run %s out of %s", str(i), config.get('runs', '1')) tests = {} existing_pools = config.get('pools', []) created_pools = [] for role in config.get('clients', clients): assert isinstance(role, basestring) PREFIX = 'client.' assert role.startswith(PREFIX) id_ = role[len(PREFIX):] pool = config.get('pool', None) if not pool and existing_pools: pool = existing_pools.pop() else: pool = ctx.manager.create_pool_with_unique_name(erasure_code_profile_name=erasure_code_profile_name) created_pools.append(pool) (remote,) = ctx.cluster.only(role).remotes.iterkeys() proc = remote.run( args=["CEPH_CLIENT_ID={id_}".format(id_=id_)] + args + ["--pool", pool], logger=log.getChild("rados.{id}".format(id=id_)), stdin=run.PIPE, wait=False ) tests[id_] = proc run.wait(tests.itervalues()) for pool in created_pools: ctx.manager.remove_pool(pool)
def thread(): if not hasattr(ctx, 'manager'): first_mon = teuthology.get_first_mon(ctx, config) (mon,) = ctx.cluster.only(first_mon).remotes.iterkeys() ctx.manager = CephManager( mon, ctx=ctx, logger=log.getChild('ceph_manager'), ) clients = ['client.{id}'.format(id=id_) for id_ in teuthology.all_roles_of_type(ctx.cluster, 'client')] log.info('clients are %s' % clients) for i in range(int(config.get('runs', '1'))): log.info("starting run %s out of %s", str(i), config.get('runs', '1')) tests = {} pools = [] for role in config.get('clients', clients): assert isinstance(role, basestring) PREFIX = 'client.' assert role.startswith(PREFIX) id_ = role[len(PREFIX):] pool = 'radosmodel-%s' % id_ pools.append(pool) ctx.manager.create_pool(pool) (remote,) = ctx.cluster.only(role).remotes.iterkeys() proc = remote.run( args=["CEPH_CLIENT_ID={id_}".format(id_=id_)] + args + ["--pool", pool], logger=log.getChild("rados.{id}".format(id=id_)), stdin=run.PIPE, wait=False ) tests[id_] = proc run.wait(tests.itervalues()) for pool in pools: ctx.manager.remove_pool(pool)
def task(ctx, config): if config is None: config = {} assert isinstance(config, dict), \ "task only supports a dictionary for configuration" overrides = ctx.config.get('overrides', {}) teuthology.deep_merge(config, overrides.get('ceph', {})) log.info('Config: ' + str(config)) testdir = teuthology.get_testdir(ctx) # set up cluster context first_ceph_cluster = False if not hasattr(ctx, 'daemons'): first_ceph_cluster = True if not hasattr(ctx, 'ceph'): ctx.ceph = {} ctx.managers = {} if 'cluster' not in config: config['cluster'] = 'ceph' cluster_name = config['cluster'] ctx.ceph[cluster_name] = argparse.Namespace() ctx.ceph[cluster_name].thrashers = [] # fixme: setup watchdog, ala ceph.py # cephadm mode? if 'cephadm_mode' not in config: config['cephadm_mode'] = 'root' assert config['cephadm_mode'] in ['root', 'cephadm-package'] if config['cephadm_mode'] == 'root': ctx.cephadm = testdir + '/cephadm' else: ctx.cephadm = 'cephadm' # in the path if first_ceph_cluster: # FIXME: this is global for all clusters ctx.daemons = DaemonGroup(use_cephadm=ctx.cephadm) # image ctx.ceph[cluster_name].image = config.get('image') ref = None if not ctx.ceph[cluster_name].image: sha1 = config.get('sha1') if sha1: ctx.ceph[cluster_name].image = 'quay.io/ceph-ci/ceph:%s' % sha1 ref = sha1 else: # hmm, fall back to branch? branch = config.get('branch', 'master') ref = branch ctx.ceph[cluster_name].image = 'quay.io/ceph-ci/ceph:%s' % branch log.info('Cluster image is %s' % ctx.ceph[cluster_name].image) # uuid fsid = str(uuid.uuid1()) log.info('Cluster fsid is %s' % fsid) ctx.ceph[cluster_name].fsid = fsid # mon ips log.info('Choosing monitor IPs and ports...') remotes_and_roles = ctx.cluster.remotes.items() roles = [role_list for (remote, role_list) in remotes_and_roles] ips = [ host for (host, port) in (remote.ssh.get_transport().getpeername() for (remote, role_list) in remotes_and_roles) ] ctx.ceph[cluster_name].mons = get_mons( roles, ips, cluster_name, mon_bind_msgr2=config.get('mon_bind_msgr2', True), mon_bind_addrvec=config.get('mon_bind_addrvec', True), ) log.info('Monitor IPs: %s' % ctx.ceph[cluster_name].mons) with contextutil.nested( lambda: ceph_initial(), lambda: normalize_hostnames(ctx=ctx), lambda: download_cephadm(ctx=ctx, config=config, ref=ref), lambda: ceph_log(ctx=ctx, config=config), lambda: ceph_crash(ctx=ctx, config=config), lambda: ceph_bootstrap(ctx=ctx, config=config), lambda: crush_setup(ctx=ctx, config=config), lambda: ceph_mons(ctx=ctx, config=config), lambda: ceph_mgrs(ctx=ctx, config=config), lambda: ceph_osds(ctx=ctx, config=config), lambda: ceph_mdss(ctx=ctx, config=config), lambda: ceph_clients(ctx=ctx, config=config), lambda: distribute_config_and_admin_keyring(ctx=ctx, config=config ), ): ctx.managers[cluster_name] = CephManager( ctx.ceph[cluster_name].bootstrap_remote, ctx=ctx, logger=log.getChild('ceph_manager.' + cluster_name), cluster=cluster_name, cephadm=True, ) try: if config.get('wait-for-healthy', True): healthy(ctx=ctx, config=config) log.info('Setup complete, yielding') yield finally: log.info('Teardown begin')
def task(ctx, config): """ Set up and tear down a Ceph cluster. For example:: tasks: - ceph: - interactive: You can also specify what branch to run:: tasks: - ceph: branch: foo Or a tag:: tasks: - ceph: tag: v0.42.13 Or a sha1:: tasks: - ceph: sha1: 1376a5ab0c89780eab39ffbbe436f6a6092314ed Or a local source dir:: tasks: - ceph: path: /home/sage/ceph To capture code coverage data, use:: tasks: - ceph: coverage: true To use btrfs, ext4, or xfs on the target's scratch disks, use:: tasks: - ceph: fs: xfs mkfs_options: [-b,size=65536,-l,logdev=/dev/sdc1] mount_options: [nobarrier, inode64] Note, this will cause the task to check the /scratch_devs file on each node for available devices. If no such file is found, /dev/sdb will be used. To run some daemons under valgrind, include their names and the tool/args to use in a valgrind section:: tasks: - ceph: valgrind: mds.1: --tool=memcheck osd.1: [--tool=memcheck, --leak-check=no] Those nodes which are using memcheck or valgrind will get checked for bad results. To adjust or modify config options, use:: tasks: - ceph: conf: section: key: value For example:: tasks: - ceph: conf: mds.0: some option: value other key: other value client.0: debug client: 10 debug ms: 1 By default, the cluster log is checked for errors and warnings, and the run marked failed if any appear. You can ignore log entries by giving a list of egrep compatible regexes, i.e.: tasks: - ceph: log-whitelist: ['foo.*bar', 'bad message'] To run multiple ceph clusters, use multiple ceph tasks, and roles with a cluster name prefix, e.g. cluster1.client.0. Roles with no cluster use the default cluster name, 'ceph'. OSDs from separate clusters must be on separate hosts. Clients and non-osd daemons from multiple clusters may be colocated. For each cluster, add an instance of the ceph task with the cluster name specified, e.g.:: roles: - [mon.a, osd.0, osd.1] - [backup.mon.a, backup.osd.0, backup.osd.1] - [client.0, backup.client.0] tasks: - ceph: cluster: ceph - ceph: cluster: backup :param ctx: Context :param config: Configuration """ if config is None: config = {} assert isinstance(config, dict), \ "task ceph only supports a dictionary for configuration" overrides = ctx.config.get('overrides', {}) teuthology.deep_merge(config, overrides.get('ceph', {})) first_ceph_cluster = False if not hasattr(ctx, 'daemons'): first_ceph_cluster = True ctx.daemons = DaemonGroup() testdir = teuthology.get_testdir(ctx) if config.get('coverage'): coverage_dir = '{tdir}/archive/coverage'.format(tdir=testdir) log.info('Creating coverage directory...') run.wait( ctx.cluster.run( args=[ 'install', '-d', '-m0755', '--', coverage_dir, ], wait=False, )) if 'cluster' not in config: config['cluster'] = 'ceph' validate_config(ctx, config) subtasks = [] if first_ceph_cluster: # these tasks handle general log setup and parsing on all hosts, # so they should only be run once subtasks = [ lambda: ceph_log(ctx=ctx, config=None), lambda: valgrind_post(ctx=ctx, config=config), ] subtasks += [ lambda: cluster(ctx=ctx, config=dict( conf=config.get('conf', {}), fs=config.get('fs', None), mkfs_options=config.get('mkfs_options', None), mount_options=config.get('mount_options', None), block_journal=config.get('block_journal', None), tmpfs_journal=config.get('tmpfs_journal', None), log_whitelist=config.get('log-whitelist', []), cpu_profile=set(config.get('cpu_profile', []), ), cluster=config['cluster'], )), lambda: run_daemon(ctx=ctx, config=config, type_='mon'), lambda: crush_setup(ctx=ctx, config=config), lambda: run_daemon(ctx=ctx, config=config, type_='osd'), lambda: cephfs_setup(ctx=ctx, config=config), lambda: run_daemon(ctx=ctx, config=config, type_='mds'), ] with contextutil.nested(*subtasks): try: if config.get('wait-for-healthy', True): healthy(ctx=ctx, config=dict(cluster=config['cluster'])) first_mon = teuthology.get_first_mon(ctx, config, config['cluster']) (mon, ) = ctx.cluster.only(first_mon).remotes.iterkeys() if not hasattr(ctx, 'managers'): ctx.managers = {} ctx.managers[config['cluster']] = CephManager( mon, ctx=ctx, logger=log.getChild('ceph_manager.' + config['cluster']), cluster=config['cluster'], ) yield finally: if config.get('wait-for-scrub', True): osd_scrub_pgs(ctx, config)
def task(ctx, config): """ replace a monitor with a newly added one, and then revert this change How it works:: 1. add a mon with specified id (mon.victim_prime) 2. wait for quorum 3. remove a monitor with specified id (mon.victim), mon.victim will commit suicide 4. wait for quorum 5. <yield> 5. add mon.a back, and start it 6. wait for quorum 7. remove mon.a_prime Options:: victim the id of the mon to be removed (pick a random mon by default) replacer the id of the new mon (use "${victim}_prime" if not specified) """ first_mon = teuthology.get_first_mon(ctx, config) (mon,) = ctx.cluster.only(first_mon).remotes.iterkeys() manager = CephManager(mon, ctx=ctx, logger=log.getChild('ceph_manager')) if config is None: config = {} assert isinstance(config, dict), \ "task ceph only supports a dictionary for configuration" overrides = ctx.config.get('overrides', {}) teuthology.deep_merge(config, overrides.get('mon_seesaw', {})) victim = config.get('victim', random.choice(_get_mons(ctx))) replacer = config.get('replacer', '{0}_prime'.format(victim)) remote = manager.find_remote('mon', victim) quorum = manager.get_mon_quorum() cluster = manager.cluster log.info('replacing {victim} with {replacer}'.format(victim=victim, replacer=replacer)) with _prepare_mon(ctx, manager, remote, replacer): with _run_daemon(ctx, remote, cluster, 'mon', replacer): # replacer will join the quorum automatically manager.wait_for_mon_quorum_size(len(quorum) + 1, 10) # if we don't remove the victim from monmap, there is chance that # we are leaving the new joiner with a monmap of 2 mon, and it will # not able to reach the other one, it will be keeping probing for # ever. log.info('removing {mon}'.format(mon=victim)) manager.raw_cluster_cmd('mon', 'remove', victim) manager.wait_for_mon_quorum_size(len(quorum), 10) # the victim will commit suicide after being removed from # monmap, let's wait until it stops. ctx.daemons.get_daemon('mon', victim, cluster).wait(10) try: # perform other tasks yield finally: # bring the victim back online # nuke the monstore of victim, otherwise it will refuse to boot # with following message: # # not in monmap and have been in a quorum before; must have # been removed log.info('re-adding {mon}'.format(mon=victim)) data_path = '/var/lib/ceph/mon/{cluster}-{id}'.format( cluster=cluster, id=victim) remote.run(args=['sudo', 'rm', '-rf', data_path]) name = 'mon.{0}'.format(victim) _setup_mon(ctx, manager, remote, victim, name, data_path, None) log.info('reviving {mon}'.format(mon=victim)) manager.revive_mon(victim) manager.wait_for_mon_quorum_size(len(quorum) + 1, 10) manager.raw_cluster_cmd('mon', 'remove', replacer) manager.wait_for_mon_quorum_size(len(quorum), 10)
def task(ctx, config): """ Populate <num_pools> pools with prefix <pool_prefix> with <num_images> rbd images at <num_snaps> snaps The config could be as follows:: populate_rbd_pool: client: <client> pool_prefix: foo num_pools: 5 num_images: 10 num_snaps: 3 image_size: 10737418240 """ if config is None: config = {} client = config.get("client", "client.0") pool_prefix = config.get("pool_prefix", "foo") num_pools = config.get("num_pools", 2) num_images = config.get("num_images", 20) num_snaps = config.get("num_snaps", 4) image_size = config.get("image_size", 100) write_size = config.get("write_size", 1024 * 1024) write_threads = config.get("write_threads", 10) write_total_per_snap = config.get("write_total_per_snap", 1024 * 1024 * 30) (remote, ) = ctx.cluster.only(client).remotes.iterkeys() if not hasattr(ctx, 'manager'): first_mon = teuthology.get_first_mon(ctx, config) (mon, ) = ctx.cluster.only(first_mon).remotes.iterkeys() ctx.manager = CephManager( mon, ctx=ctx, logger=log.getChild('ceph_manager'), ) for poolid in range(num_pools): poolname = "%s-%s" % (pool_prefix, str(poolid)) log.info("Creating pool %s" % (poolname, )) ctx.manager.create_pool(poolname) for imageid in range(num_images): imagename = "rbd-%s" % (str(imageid), ) log.info("Creating imagename %s" % (imagename, )) remote.run(args=[ "rbd", "create", imagename, "--image-format", "1", "--size", str(image_size), "--pool", str(poolname) ]) def bench_run(): remote.run(args=[ "rbd", "bench-write", imagename, "--pool", poolname, "--io-size", str(write_size), "--io-threads", str(write_threads), "--io-total", str(write_total_per_snap), "--io-pattern", "rand" ]) log.info("imagename %s first bench" % (imagename, )) bench_run() for snapid in range(num_snaps): snapname = "snap-%s" % (str(snapid), ) log.info("imagename %s creating snap %s" % (imagename, snapname)) remote.run(args=[ "rbd", "snap", "create", "--pool", poolname, "--snap", snapname, imagename ]) bench_run() try: yield finally: log.info('done')
def task(ctx, config): """ replace a monitor with a newly added one, and then revert this change How it works:: 1. add a mon with specified id (mon.victim_prime) 2. wait for quorum 3. remove a monitor with specified id (mon.victim), mon.victim will commit suicide 4. wait for quorum 5. <yield> 5. add mon.a back, and start it 6. wait for quorum 7. remove mon.a_prime Options:: victim the id of the mon to be removed (pick a random mon by default) replacer the id of the new mon (use "${victim}_prime" if not specified) """ first_mon = teuthology.get_first_mon(ctx, config) (mon, ) = ctx.cluster.only(first_mon).remotes.iterkeys() manager = CephManager(mon, ctx=ctx, logger=log.getChild('ceph_manager')) if config is None: config = {} assert isinstance(config, dict), \ "task ceph only supports a dictionary for configuration" overrides = ctx.config.get('overrides', {}) teuthology.deep_merge(config, overrides.get('mon_seesaw', {})) victim = config.get('victim', random.choice(_get_mons(ctx))) replacer = config.get('replacer', '{0}_prime'.format(victim)) remote = manager.find_remote('mon', victim) quorum = manager.get_mon_quorum() cluster = manager.cluster log.info('replacing {victim} with {replacer}'.format(victim=victim, replacer=replacer)) with _prepare_mon(ctx, manager, remote, replacer): with _run_daemon(ctx, remote, cluster, 'mon', replacer): # replacer will join the quorum automatically manager.wait_for_mon_quorum_size(len(quorum) + 1, 10) # if we don't remove the victim from monmap, there is chance that # we are leaving the new joiner with a monmap of 2 mon, and it will # not able to reach the other one, it will be keeping probing for # ever. log.info('removing {mon}'.format(mon=victim)) manager.raw_cluster_cmd('mon', 'remove', victim) manager.wait_for_mon_quorum_size(len(quorum), 10) # the victim will commit suicide after being removed from # monmap, let's wait until it stops. ctx.daemons.get_daemon('mon', victim, cluster).wait(10) try: # perform other tasks yield finally: # bring the victim back online # nuke the monstore of victim, otherwise it will refuse to boot # with following message: # # not in monmap and have been in a quorum before; must have # been removed log.info('re-adding {mon}'.format(mon=victim)) data_path = '/var/lib/ceph/mon/{cluster}-{id}'.format( cluster=cluster, id=victim) remote.run(args=['sudo', 'rm', '-rf', data_path]) name = 'mon.{0}'.format(victim) _setup_mon(ctx, manager, remote, victim, name, data_path, None) log.info('reviving {mon}'.format(mon=victim)) manager.revive_mon(victim) manager.wait_for_mon_quorum_size(len(quorum) + 1, 10) manager.raw_cluster_cmd('mon', 'remove', replacer) manager.wait_for_mon_quorum_size(len(quorum), 10)
def task(ctx, config): if config is None: config = {} assert isinstance(config, dict), \ "task only supports a dictionary for configuration" overrides = ctx.config.get('overrides', {}) teuthology.deep_merge(config, overrides.get('ceph', {})) log.info('Config: ' + str(config)) testdir = teuthology.get_testdir(ctx) # set up cluster context first_ceph_cluster = False if not hasattr(ctx, 'daemons'): first_ceph_cluster = True ctx.daemons = DaemonGroup( use_ceph_daemon='{}/ceph-daemon'.format(testdir)) if not hasattr(ctx, 'ceph'): ctx.ceph = {} ctx.managers = {} if 'cluster' not in config: config['cluster'] = 'ceph' cluster_name = config['cluster'] ctx.ceph[cluster_name] = argparse.Namespace() #validate_config(ctx, config) # image ctx.image = config.get('image') ref = None if not ctx.image: sha1 = config.get('sha1') if sha1: ctx.image = 'quay.io/ceph-ci/ceph:%s' % sha1 ref = sha1 else: # hmm, fall back to branch? branch = config.get('branch', 'master') ref = branch # FIXME when ceph-ci builds all branches if branch in ['master', 'nautilus']: ctx.image = 'ceph/daemon-base:latest-%s-devel' % branch else: ctx.image = 'quay.io/ceph-ci/ceph:%s' % branch log.info('Cluster image is %s' % ctx.image) # uuid fsid = str(uuid.uuid1()) log.info('Cluster fsid is %s' % fsid) ctx.ceph[cluster_name].fsid = fsid # mon ips log.info('Choosing monitor IPs and ports...') remotes_and_roles = ctx.cluster.remotes.items() roles = [role_list for (remote, role_list) in remotes_and_roles] ips = [ host for (host, port) in (remote.ssh.get_transport().getpeername() for (remote, role_list) in remotes_and_roles) ] ctx.ceph[cluster_name].mons = get_mons( roles, ips, cluster_name, mon_bind_msgr2=config.get('mon_bind_msgr2', True), mon_bind_addrvec=config.get('mon_bind_addrvec', True), ) log.info('Monitor IPs: %s' % ctx.ceph[cluster_name].mons) with contextutil.nested( lambda: ceph_initial(), lambda: normalize_hostnames(ctx=ctx), lambda: download_ceph_daemon(ctx=ctx, config=config, ref=ref), lambda: ceph_log(ctx=ctx, config=config), lambda: ceph_crash(ctx=ctx, config=config), lambda: ceph_bootstrap(ctx=ctx, config=config), lambda: ceph_mons(ctx=ctx, config=config), lambda: ceph_mgrs(ctx=ctx, config=config), lambda: ceph_osds(ctx=ctx, config=config), lambda: ceph_mdss(ctx=ctx, config=config), lambda: distribute_config_and_admin_keyring(ctx=ctx, config=config ), ): ctx.managers[cluster_name] = CephManager( ctx.ceph[cluster_name].bootstrap_remote, ctx=ctx, logger=log.getChild('ceph_manager.' + cluster_name), cluster=cluster_name, ceph_daemon=True, ) try: if config.get('wait-for-healthy', True): healthy(ctx=ctx, config=config) log.info('Setup complete, yielding') yield finally: log.info('Teardown begin')