def loadImages(self, datapath, dims=None, inputformat='stack', dtype='int16', startidx=None, stopidx=None): """ Loads an Images object from data stored as a binary image stack, tif, tif-stack, or png files. Supports single files or multiple files, stored on a local file system, a networked file sytem (mounted and available on all nodes), or Amazon S3. HDFS is not currently supported for image file data. Parameters ---------- datapath: string Path to data files or directory, specified as either a local filesystem path or in a URI-like format, including scheme. A datapath argument may include a single '*' wildcard character in the filename. Examples of valid datapaths include 'a/local/relative/directory/*.stack", "s3n:///my-s3-bucket/data/mydatafile.tif", "/mnt/my/absolute/data/directory/", or "file:///mnt/another/data/directory/". dims: tuple of positive int, optional (but required if inputformat is 'stack') Dimensions of input image data, similar to a numpy 'shape' parameter, for instance (1024, 1024, 48). Binary stack data will be interpreted as coming from a multidimensional array of the specified dimensions. Stack data should be stored in row-major order (Fortran or Matlab convention) rather than column-major order (C or python/numpy convention), where the first dimension corresponds to that which is changing most rapidly on disk. So for instance given dims of (x, y, z), the coordinates of the data in a binary stack file should be ordered as [(x0, y0, z0), (x1, y0, zo), ..., (xN, y0, z0), (x0, y1, z0), (x1, y1, z0), ..., (xN, yM, z0), (x0, y0, z1), ..., (xN, yM, zP)]. If inputformat is 'png', 'tif', or'tif-stack', the dims parameter (if any) will be ignored; data dimensions will instead be read out from the image file headers. inputformat: {'stack', 'png', 'tif', 'tif-stack'}. optional, default 'stack' Expected format of the input data. 'stack' indicates flat files of raw binary data. 'png' or 'tif' indicate two-dimensional image files of the corresponding formats. 'tif-stack' indicates a sequence of multipage tif files, with each page of the tif corresponding to a separate z-plane. For all formats, separate files are interpreted as distinct time points, with ordering given by lexicographic sorting of file names. This method assumes that stack data consists of signed 16-bit integers in native byte order. Data types of image file data will be as specified in the file headers. dtype: string or numpy dtype. optional, default 'int16' Data type of the image files to be loaded, specified as a numpy "dtype" string. If inputformat is 'tif-stack', the dtype parameter (if any) will be ignored; data type will instead be read out from the tif headers. startidx: nonnegative int, optional startidx and stopidx are convenience parameters to allow only a subset of input files to be read in. These parameters give the starting index (inclusive) and final index (exclusive) of the data files to be used after lexicographically sorting all input data files matching the datapath argument. For example, startidx=None (the default) and stopidx=10 will cause only the first 10 data files in datapath to be read in; startidx=2 and stopidx=3 will cause only the third file (zero-based index of 2) to be read in. startidx and stopidx use the python slice indexing convention (zero-based indexing with an exclusive final position). stopidx: nonnegative int, optional See startidx. Returns ------- data: thunder.rdds.Images A newly-created Images object, wrapping an RDD of <int index, numpy array> key-value pairs. """ checkparams(inputformat, ['stack', 'png', 'tif', 'tif-stack']) from thunder.rdds.fileio.imagesloader import ImagesLoader loader = ImagesLoader(self._sc) if inputformat.lower() == 'stack': data = loader.fromStack(datapath, dims, dtype=dtype, startidx=startidx, stopidx=stopidx) elif inputformat.lower() == 'tif': data = loader.fromTif(datapath, startidx=startidx, stopidx=stopidx) elif inputformat.lower() == 'tif-stack': data = loader.fromMultipageTif(datapath, startidx=startidx, stopidx=stopidx) else: data = loader.fromPng(datapath) return data
def loadImages(self, dataPath, dims=None, dtype=None, inputFormat='stack', ext=None, startIdx=None, stopIdx=None, recursive=False, nplanes=None, npartitions=None, renumber=False, confFilename='conf.json'): """ Loads an Images object from data stored as a binary image stack, tif, or png files. Supports single files or multiple files, stored on a local file system, a networked file sytem (mounted and available on all nodes), or Amazon S3. HDFS is not currently supported for image file data. Parameters ---------- dataPath: string Path to data files or directory, as either a local filesystem path or a URI. May include a single '*' wildcard in the filename. Examples of valid dataPaths include 'local/directory/*.stack", "s3n:///my-s3-bucket/data/", or "file:///mnt/another/directory/". dims: tuple of positive int, optional (required if inputFormat is 'stack') Image dimensions. Binary stack data will be interpreted as a multidimensional array with the given dimensions, and should be stored in row-major order (Fortran or Matlab convention), where the first dimension changes most rapidly. For 'png' or 'tif' data dimensions will be read from the image file headers. inputFormat: str, optional, default = 'stack' Expected format of the input data: 'stack', 'png', or 'tif'. 'stack' indicates flat binary stacks. 'png' or 'tif' indicate image format. Page of a multipage tif file will be extend along the third dimension. Separate files interpreted as distinct records, with ordering given by lexicographic sorting of file names. ext: string, optional, default = None File extension, default will be "bin" if inputFormat=="stack", "tif" for inputFormat=='tif', and 'png' for inputFormat=="png". dtype: string or numpy dtype, optional, default = 'int16' Data type of the image files to be loaded, specified as a numpy "dtype" string. Ignored for 'tif' or 'png' (data will be inferred from image formats). startIdx: nonnegative int, optional, default = None Convenience parameters to read only a subset of input files. Uses python slice conventions (zero-based indexing with exclusive final position). These parameters give the starting and final index after lexicographic sorting. stopIdx: nonnegative int, optional, default = None See startIdx. recursive: boolean, optional, default = False If true, will recursively descend directories rooted at dataPath, loading all files in the tree with an appropriate extension. nplanes: positive integer, optional, default = None Subdivide individual image files. Every `nplanes` from each file will be considered a new record. With nplanes=None (the default), a single file will be considered as representing a single record. If the number of records per file is not the same across all files, then `renumber` should be set to True to ensure consistent keys. npartitions: positive int, optional, default = None Specify number of partitions for the RDD, if unspecified will use 1 partition per image. renumber: boolean, optional, default = False Recalculate keys for records after images are loading. Only necessary if different files contain different number of records (e.g. due to specifying nplanes). See Images.renumber(). confFilename : string, optional, default = 'conf.json' Name of conf file if using to specify parameters for binary stack data Returns ------- data: thunder.rdds.Images An Images object, wrapping an RDD of with (int) : (numpy array) pairs """ checkParams(inputFormat, ['stack', 'png', 'tif', 'tif-stack']) from thunder.rdds.fileio.imagesloader import ImagesLoader loader = ImagesLoader(self._sc) # Checking StartIdx is smaller or equal to StopIdx if startIdx is not None and stopIdx is not None and startIdx > stopIdx: raise Exception("Error. startIdx {} is larger than stopIdx {}".inputFormat(startIdx, stopIdx)) if not ext: ext = DEFAULT_EXTENSIONS.get(inputFormat.lower(), None) if inputFormat.lower() == 'stack': data = loader.fromStack(dataPath, dims=dims, dtype=dtype, ext=ext, startIdx=startIdx, stopIdx=stopIdx, recursive=recursive, nplanes=nplanes, npartitions=npartitions, confFilename=confFilename) elif inputFormat.lower().startswith('tif'): data = loader.fromTif(dataPath, ext=ext, startIdx=startIdx, stopIdx=stopIdx, recursive=recursive, nplanes=nplanes, npartitions=npartitions) else: if nplanes: raise NotImplementedError("nplanes argument is not supported for png files") data = loader.fromPng(dataPath, ext=ext, startIdx=startIdx, stopIdx=stopIdx, recursive=recursive, npartitions=npartitions) if not renumber: return data else: return data.renumber()
def loadImages(self, dataPath, dims=None, inputFormat='stack', ext=None, dtype='int16', startIdx=None, stopIdx=None, recursive=False, nplanes=None, npartitions=None, renumber=False): """ Loads an Images object from data stored as a binary image stack, tif, or png files. Supports single files or multiple files, stored on a local file system, a networked file sytem (mounted and available on all nodes), or Amazon S3. HDFS is not currently supported for image file data. Parameters ---------- dataPath: string Path to data files or directory, specified as either a local filesystem path or in a URI-like format, including scheme. A dataPath argument may include a single '*' wildcard character in the filename. Examples of valid dataPaths include 'a/local/relative/directory/*.stack", "s3n:///my-s3-bucket/data/mydatafile.tif", "/mnt/my/absolute/data/directory/", or "file:///mnt/another/data/directory/". dims: tuple of positive int, optional (but required if inputFormat is 'stack') Dimensions of input image data, similar to a numpy 'shape' parameter, for instance (1024, 1024, 48). Binary stack data will be interpreted as coming from a multidimensional array of the specified dimensions. Stack data should be stored in row-major order (Fortran or Matlab convention) rather than column-major order (C or python/numpy convention), where the first dimension corresponds to that which is changing most rapidly on disk. So for instance given dims of (x, y, z), the coordinates of the data in a binary stack file should be ordered as [(x0, y0, z0), (x1, y0, zo), ..., (xN, y0, z0), (x0, y1, z0), (x1, y1, z0), ..., (xN, yM, z0), (x0, y0, z1), ..., (xN, yM, zP)]. If inputFormat is 'png' or 'tif', the dims parameter (if any) will be ignored; data dimensions will instead be read out from the image file headers. inputFormat: {'stack', 'png', 'tif'}. optional, default 'stack' Expected format of the input data. 'stack' indicates flat files of raw binary data. 'png' or 'tif' indicate image files of the corresponding formats. Each page of a multipage tif file will be interpreted as a separate z-plane. For all formats, separate files are interpreted as distinct time points, with ordering given by lexicographic sorting of file names. ext: string, optional, default None Extension required on data files to be loaded. By default will be "stack" if inputFormat=="stack", "tif" for inputFormat=='tif', and 'png' for inputFormat="png". dtype: string or numpy dtype. optional, default 'int16' Data type of the image files to be loaded, specified as a numpy "dtype" string. If inputFormat is 'tif' or 'png', the dtype parameter (if any) will be ignored; data type will instead be read out from the tif headers. startIdx: nonnegative int, optional startIdx and stopIdx are convenience parameters to allow only a subset of input files to be read in. These parameters give the starting index (inclusive) and final index (exclusive) of the data files to be used after lexicographically sorting all input data files matching the dataPath argument. For example, startIdx=None (the default) and stopIdx=10 will cause only the first 10 data files in dataPath to be read in; startIdx=2 and stopIdx=3 will cause only the third file (zero-based index of 2) to be read in. startIdx and stopIdx use the python slice indexing convention (zero-based indexing with an exclusive final position). stopIdx: nonnegative int, optional See startIdx. recursive: boolean, default False If true, will recursively descend directories rooted at dataPath, loading all files in the tree that have an appropriate extension. Recursive loading is currently only implemented for local filesystems (not s3). nplanes: positive integer, default None If passed, will cause a single image file to be subdivided into multiple records. Every `nplanes` z-planes (or multipage tif pages) in the file will be taken as a new record, with the first nplane planes of the first file being record 0, the second nplane planes being record 1, etc, until the first file is exhausted and record ordering continues with the first nplane planes of the second file, and so on. With nplanes=None (the default), a single file will be considered as representing a single record. Keys are calculated assuming that all input files contain the same number of records; if the number of records per file is not the same across all files, then `renumber` should be set to True to ensure consistent keys. npartitions: positive int, optional If specified, request a certain number of partitions for the underlying Spark RDD. Default is 1 partition per image file. renumber: boolean, optional, default False If renumber evaluates to True, then the keys for each record will be explicitly recalculated after all images are loaded. This should only be necessary at load time when different files contain different number of records. See Images.renumber(). Returns ------- data: thunder.rdds.Images A newly-created Images object, wrapping an RDD of <int index, numpy array> key-value pairs. """ checkParams(inputFormat, ['stack', 'png', 'tif', 'tif-stack']) from thunder.rdds.fileio.imagesloader import ImagesLoader loader = ImagesLoader(self._sc) if not ext: ext = DEFAULT_EXTENSIONS.get(inputFormat.lower(), None) if inputFormat.lower() == 'stack': data = loader.fromStack(dataPath, dims, dtype=dtype, ext=ext, startIdx=startIdx, stopIdx=stopIdx, recursive=recursive, nplanes=nplanes, npartitions=npartitions) elif inputFormat.lower().startswith('tif'): data = loader.fromTif(dataPath, ext=ext, startIdx=startIdx, stopIdx=stopIdx, recursive=recursive, nplanes=nplanes, npartitions=npartitions) else: if nplanes: raise NotImplementedError("nplanes argument is not supported for png files") data = loader.fromPng(dataPath, ext=ext, startIdx=startIdx, stopIdx=stopIdx, recursive=recursive, npartitions=npartitions) if not renumber: return data else: return data.renumber()
def loadImages(self, dataPath, dims=None, dtype=None, inputFormat='stack', ext=None, startIdx=None, stopIdx=None, recursive=False, nplanes=None, npartitions=None, renumber=False, confFilename='conf.json'): """ Loads an Images object from data stored as a binary image stack, tif, or png files. Supports single files or multiple files, stored on a local file system, a networked file sytem (mounted and available on all nodes), or Amazon S3. HDFS is not currently supported for image file data. Parameters ---------- dataPath: string Path to data files or directory, specified as either a local filesystem path or in a URI-like format, including scheme. A dataPath argument may include a single '*' wildcard character in the filename. Examples of valid dataPaths include 'a/local/relative/directory/*.stack", "s3n:///my-s3-bucket/data/mydatafile.tif", "/mnt/my/absolute/data/directory/", or "file:///mnt/another/data/directory/". dims: tuple of positive int, optional (but required if inputFormat is 'stack') Dimensions of input image data, similar to a numpy 'shape' parameter, for instance (1024, 1024, 48). Binary stack data will be interpreted as coming from a multidimensional array of the specified dimensions. Stack data should be stored in row-major order (Fortran or Matlab convention) rather than column-major order (C or python/numpy convention), where the first dimension corresponds to that which is changing most rapidly on disk. So for instance given dims of (x, y, z), the coordinates of the data in a binary stack file should be ordered as [(x0, y0, z0), (x1, y0, zo), ..., (xN, y0, z0), (x0, y1, z0), (x1, y1, z0), ..., (xN, yM, z0), (x0, y0, z1), ..., (xN, yM, zP)]. If inputFormat is 'png' or 'tif', the dims parameter (if any) will be ignored; data dimensions will instead be read out from the image file headers. inputFormat: {'stack', 'png', 'tif'}. optional, default 'stack' Expected format of the input data. 'stack' indicates flat files of raw binary data. 'png' or 'tif' indicate image files of the corresponding formats. Each page of a multipage tif file will be interpreted as a separate z-plane. For all formats, separate files are interpreted as distinct time points, with ordering given by lexicographic sorting of file names. ext: string, optional, default None Extension required on data files to be loaded. By default will be "stack" if inputFormat=="stack", "tif" for inputFormat=='tif', and 'png' for inputFormat="png". dtype: string or numpy dtype. optional, default 'int16' Data type of the image files to be loaded, specified as a numpy "dtype" string. If inputFormat is 'tif' or 'png', the dtype parameter (if any) will be ignored; data type will instead be read out from the tif headers. startIdx: nonnegative int, optional startIdx and stopIdx are convenience parameters to allow only a subset of input files to be read in. These parameters give the starting index (inclusive) and final index (exclusive) of the data files to be used after lexicographically sorting all input data files matching the dataPath argument. For example, startIdx=None (the default) and stopIdx=10 will cause only the first 10 data files in dataPath to be read in; startIdx=2 and stopIdx=3 will cause only the third file (zero-based index of 2) to be read in. startIdx and stopIdx use the python slice indexing convention (zero-based indexing with an exclusive final position). stopIdx: nonnegative int, optional See startIdx. recursive: boolean, default False If true, will recursively descend directories rooted at dataPath, loading all files in the tree that have an appropriate extension. Recursive loading is currently only implemented for local filesystems (not s3). nplanes: positive integer, default None If passed, will cause a single image file to be subdivided into multiple records. Every `nplanes` z-planes (or multipage tif pages) in the file will be taken as a new record, with the first nplane planes of the first file being record 0, the second nplane planes being record 1, etc, until the first file is exhausted and record ordering continues with the first nplane planes of the second file, and so on. With nplanes=None (the default), a single file will be considered as representing a single record. Keys are calculated assuming that all input files contain the same number of records; if the number of records per file is not the same across all files, then `renumber` should be set to True to ensure consistent keys. npartitions: positive int, optional If specified, request a certain number of partitions for the underlying Spark RDD. Default is 1 partition per image file. renumber: boolean, optional, default False If renumber evaluates to True, then the keys for each record will be explicitly recalculated after all images are loaded. This should only be necessary at load time when different files contain different number of records. See Images.renumber(). Returns ------- data: thunder.rdds.Images A newly-created Images object, wrapping an RDD of <int index, numpy array> key-value pairs. """ checkParams(inputFormat, ['stack', 'png', 'tif', 'tif-stack']) from thunder.rdds.fileio.imagesloader import ImagesLoader loader = ImagesLoader(self._sc) if not ext: ext = DEFAULT_EXTENSIONS.get(inputFormat.lower(), None) if inputFormat.lower() == 'stack': data = loader.fromStack(dataPath, dims=dims, dtype=dtype, ext=ext, startIdx=startIdx, stopIdx=stopIdx, recursive=recursive, nplanes=nplanes, npartitions=npartitions, confFilename=confFilename) elif inputFormat.lower().startswith('tif'): data = loader.fromTif(dataPath, ext=ext, startIdx=startIdx, stopIdx=stopIdx, recursive=recursive, nplanes=nplanes, npartitions=npartitions) else: if nplanes: raise NotImplementedError( "nplanes argument is not supported for png files") data = loader.fromPng(dataPath, ext=ext, startIdx=startIdx, stopIdx=stopIdx, recursive=recursive, npartitions=npartitions) if not renumber: return data else: return data.renumber()