def test_SerializableLock(): a = SerializableLock() b = SerializableLock() with a: pass with a: with b: pass with a: assert not a.acquire(False) a2 = pickle.loads(pickle.dumps(a)) a3 = pickle.loads(pickle.dumps(a)) a4 = pickle.loads(pickle.dumps(a2)) for x in [a, a2, a3, a4]: for y in [a, a2, a3, a4]: with x: assert not y.acquire(False) b2 = pickle.loads(pickle.dumps(b)) b3 = pickle.loads(pickle.dumps(b2)) for x in [a, a2, a3, a4]: for y in [b, b2, b3]: with x: with y: pass with y: with x: pass
def test_SerializableLock_name_collision(): a = SerializableLock('a') b = SerializableLock('b') c = SerializableLock('a') d = SerializableLock() assert a.lock is not b.lock assert a.lock is c.lock assert d.lock not in (a.lock, b.lock, c.lock)
def test_fuse_getitem_lock(): lock1 = SerializableLock() lock2 = SerializableLock() pairs = [ ( (getter, (getter, "x", slice(1000, 2000), True, lock1), slice(15, 20)), (getter, "x", slice(1015, 1020), True, lock1), ), ( ( getitem, (getter, "x", (slice(1000, 2000), slice(100, 200)), True, lock1), (slice(15, 20), slice(50, 60)), ), (getter, "x", (slice(1015, 1020), slice(150, 160)), True, lock1), ), ( ( getitem, ( getter_nofancy, "x", (slice(1000, 2000), slice(100, 200)), True, lock1, ), (slice(15, 20), slice(50, 60)), ), (getter_nofancy, "x", (slice(1015, 1020), slice(150, 160)), True, lock1), ), ( ( getter, (getter, "x", slice(1000, 2000), True, lock1), slice(15, 20), True, lock2, ), ( getter, (getter, "x", slice(1000, 2000), True, lock1), slice(15, 20), True, lock2, ), ), ] for inp, expected in pairs: result = optimize_slices({"y": inp}) assert result == {"y": expected}
def get_features(cutout, module, features, tmpdir=None): """ Load the feature data for a given module. This get the data for a set of features from a module. All modules in `atlite.datasets` are allowed. """ parameters = cutout.data.attrs lock = SerializableLock() datasets = [] get_data = datamodules[module].get_data for feature in features: feature_data = delayed(get_data)(cutout, feature, tmpdir=tmpdir, lock=lock, **parameters) datasets.append(feature_data) datasets = compute(*datasets) ds = xr.merge(datasets, compat="equals") for v in ds: ds[v].attrs["module"] = module fd = datamodules[module].features.items() ds[v].attrs["feature"] = [k for k, l in fd if v in l].pop() return ds
def test_fuse_getitem_lock(): lock1 = SerializableLock() lock2 = SerializableLock() pairs = [((getarray, (getarray, 'x', slice(1000, 2000), lock1), slice(15, 20)), (getarray, 'x', slice(1015, 1020), lock1)), ((getitem, (getarray, 'x', (slice(1000, 2000), slice(100, 200)), lock1), (slice(15, 20), slice(50, 60))), (getarray, 'x', (slice(1015, 1020), slice(150, 160)), lock1)), ((getitem, (getarray_nofancy, 'x', (slice(1000, 2000), slice(100, 200)), lock1), (slice(15, 20), slice(50, 60))), (getarray_nofancy, 'x', (slice(1015, 1020), slice(150, 160)), lock1)), ((getarray, (getarray, 'x', slice(1000, 2000), lock1), slice(15, 20), lock2), (getarray, (getarray, 'x', slice(1000, 2000), lock1), slice(15, 20), lock2))] for inp, expected in pairs: result = optimize_slices({'y': inp}) assert result == {'y': expected}
def test_h5py_serialize(c, s, a, b): from dask.utils import SerializableLock lock = SerializableLock('hdf5') with tmpfile() as fn: with h5py.File(fn, mode='a') as f: x = f.create_dataset('/group/x', shape=(4,), dtype='i4', chunks=(2,)) x[:] = [1, 2, 3, 4] with h5py.File(fn, mode='r') as f: dset = f['/group/x'] x = da.from_array(dset, chunks=dset.chunks, lock=lock) y = c.compute(x) y = yield y assert (y[:] == dset[:]).all()
async def test_h5py_serialize(c, s, a, b): from dask.utils import SerializableLock lock = SerializableLock("hdf5") with tmpfile() as fn: with h5py.File(fn, mode="a") as f: x = f.create_dataset("/group/x", shape=(4,), dtype="i4", chunks=(2,)) x[:] = [1, 2, 3, 4] with h5py.File(fn, mode="r") as f: dset = f["/group/x"] x = da.from_array(dset, chunks=dset.chunks, lock=lock) y = c.compute(x) y = await y assert (y[:] == dset[:]).all()
def prepare_zarr_storage(variations, out_path): store = zarr.DirectoryStore(str(out_path)) root = zarr.group(store=store, overwrite=True) metadata = variations.metadata sources = [] targets = [] samples_array = variations.samples #samples_array.compute_chunk_sizes() sources.append(samples_array) object_codec = None if samples_array.dtype == object: object_codec = numcodecs.VLenUTF8() dataset = zarr.create(shape=samples_array.shape, path='samples', store=store, dtype=samples_array.dtype, object_codec=object_codec) targets.append(dataset) variants = root.create_group(ZARR_VARIANTS_GROUP_NAME, overwrite=True) calls = root.create_group(ZARR_CALL_GROUP_NAME, overwrite=True) for field, array in variations.items(): definition = ALLELE_ZARR_DEFINITION_MAPPINGS[field] field_metadata = metadata.get(field, None) array = variations[field] if array is None: continue array.compute_chunk_sizes() sources.append(array) group_name = definition['group'] group = calls if group_name == ZARR_CALL_GROUP_NAME else variants path = os.path.sep + os.path.join(group.path, definition['field']) object_codec = None if array.dtype == object: object_codec = numcodecs.VLenUTF8() dataset = zarr.create(shape=array.shape, path=path, store=store, object_codec=object_codec, dtype=array.dtype) if field_metadata is not None: for key, value in field_metadata.items(): dataset.attrs[key] = value targets.append(dataset) lock = SerializableLock() return da.store(sources, targets, compute=False, lock=lock)
def get_scheduler_lock(scheduler, path_or_file=None): """ Get the appropriate lock for a certain situation based onthe dask scheduler used. See Also -------- dask.utils.get_scheduler_lock """ if scheduler == 'distributed': from dask.distributed import Lock return Lock(path_or_file) elif scheduler == 'multiprocessing': return multiprocessing.Lock() elif scheduler == 'threaded': from dask.utils import SerializableLock return SerializableLock() else: return threading.Lock()
try: from dask.utils import SerializableLock except ImportError: # no need to worry about serializing the lock SerializableLock = threading.Lock try: from dask.distributed import Lock as DistributedLock except ImportError: DistributedLock = None # Locks used by multiple backends. # Neither HDF5 nor the netCDF-C library are thread-safe. HDF5_LOCK = SerializableLock() NETCDFC_LOCK = SerializableLock() _FILE_LOCKS: MutableMapping[Any, threading.Lock] = weakref.WeakValueDictionary() def _get_threaded_lock(key): try: lock = _FILE_LOCKS[key] except KeyError: lock = _FILE_LOCKS[key] = threading.Lock() return lock def _get_multiprocessing_lock(key):
def test_SerializableLock_acquire_blocking(): a = SerializableLock("a") assert a.acquire(blocking=True) assert not a.acquire(blocking=False) a.release()
def test_SerializableLock_locked(): a = SerializableLock("a") assert not a.locked() with a: assert a.locked() assert not a.locked()
import time import traceback import warnings from collections import Mapping, OrderedDict import numpy as np from ..conventions import cf_encoder from ..core import indexing from ..core.pycompat import dask_array_type, iteritems from ..core.utils import FrozenOrderedDict, NdimSizeLenMixin # Import default lock try: from dask.utils import SerializableLock HDF5_LOCK = SerializableLock() except ImportError: HDF5_LOCK = threading.Lock() # Create a logger object, but don't add any handlers. Leave that to user code. logger = logging.getLogger(__name__) NONE_VAR_NAME = '__values__' def get_scheduler(get=None, collection=None): """ Determine the dask scheduler that is being used. None is returned if not dask scheduler is active. See also
def test_SerializableLock_acquire_blocking(): a = SerializableLock('a') assert a.acquire(blocking=True) assert not a.acquire(blocking=False) a.release()
def test_SerializableLock_locked(): a = SerializableLock('a') assert not a.locked() with a: assert a.locked() assert not a.locked()
from dask.utils import SerializableLock from donfig import Config from cmip6_downscaling.config import _defaults config = Config("cmip6_downscaling", defaults=[_defaults]) config.config_lock = SerializableLock() CLIMATE_NORMAL_PERIOD = (1970, 2000)