示例#1
0
def calc_csv(dataframe: object,
             save_folder: str,
             aa_column: str = 'Info_window_seq',
             Ncores: int = 1,
             chunksize: int = None):
    """
    Calculates conjoint triads features chunk by chunk from the inputted 'dataframe'.
    It saves each processed chunk as a CSV(s).

    Results appended as a new column named feat_CT_{subsequence} e.g. feat_CT_305 etc.

    This is a Ram efficient way of calculating the Features as the features are calculated on a single chunk of the dataframe (of
    chunksize number of rows) at a time and when a chunk has been been processed and saved as a CSV, then the chunk
    is deleted freeing up RAM.

    :param dataframe: A pandas DataFrame that contains a column/feature that is composed of purely Amino-Acid sequences (pepides).
    :param save_folder: Path to folder for saving the output.
    :param aa_column: Name of column in dataframe consisting of Amino-Acid sequences to process. Default='Info_window_seq'
    :param Ncores: Number of cores to use. default=1
    :param chunksize: Number of rows to be processed at a time. default=None (Where a 'None' object denotes no chunks but the entire dataframe to be processed)
    """

    _utils.multiprocessing_export_csv(dataframe=dataframe,
                                      function=_algorithm,
                                      Ncores=Ncores,
                                      chunksize=chunksize,
                                      save_folder=save_folder,
                                      aa_column=aa_column)
示例#2
0
"""