Python KmerHelper.create_kmers_from_string示例

编程语言: Python

命名空间/包名称: immuneML.util.KmerHelper

类/类型: KmerHelper

方法/功能: create_kmers_from_string

hotexamples.com的示例: 3

Python KmerHelper.create_kmers_from_string - 已找到3个示例。这些是从开源项目中提取的最受好评的immuneML.util.KmerHelper.KmerHelper.create_kmers_from_string现实Python示例。您可以评价示例，以帮助我们提高示例质量。

常用方法

显示隐藏

create_kmers_from_sequence(4)

create_all_kmers(3)

create_kmers_from_string(3)

create_IMGT_gapped_kmers_from_sequence(2)

create_IMGT_kmers_from_sequence(2)

create_kmers_within_HD(2)

create_sentences_from_repertoire(2)

create_gapped_kmers_from_sequence(1)

create_gapped_kmers_from_string(1)

示例#1

显示文件

    def test_create_kmers_from_string(self):
        kmers = KmerHelper.create_kmers_from_string("ABCDEFG", 3)
        self.assertTrue("ABC" in kmers and "BCD" in kmers and "CDE" in kmers and "DEF" in kmers and "EFG" in kmers)
        self.assertEqual(5, len(kmers))

        kmers = KmerHelper.create_kmers_from_string("AB", 3)
        self.assertTrue(len(kmers) == 0)

示例#2

显示文件

文件： Util.py 项目： uio-bmi/immuneML

    def compute_tcrb_relative_abundance(sequences: np.ndarray, counts: np.ndarray, k: int) -> dict:
        """
        Computes the relative abundance of k-mers in the repertoire per following equations where C is the template count for the given receptor
        sequence, T is the total count across all receptor sequences. The relative abundance per receptor sequence is then computed and only the
        maximum sequence abudance was used for the k-mer so that the k-mer's relative abundance is equal to the abundance of the most frequent
        receptor sequence in which the receptor appears:

        .. math::

            T^{TCR \\beta} = \\sum_{TCR\\beta} C^{TCR\\beta}

            RA^{TCR\\beta} = \\frac{C^{TCR\\beta}}{T^{TCR\\beta}}

            RA = \\max_{\\underset{with \\, kmer}{TCR\\beta}} {RA^{TCR \\beta}}

        For more details, please see the original publication: Ostmeyer J, Christley S, Toby IT, Cowell LG. Biophysicochemical motifs in T cell
        receptor sequences distinguish repertoires from tumor-infiltrating lymphocytes and adjacent healthy tissue. Cancer Res. Published online
        January 1, 2019:canres.2292.2018. `doi:10.1158/0008-5472.CAN-18-2292 <https://cancerres.aacrjournals.org/content/canres/79/7/1671.full.pdf>`_

        Arguments:

            sequences: an array of (amino acid) sequences (corresponding to a repertoire)
            counts: an array of counts for each of the sequences
            k: the length of the k-mer (in the publication referenced above, k is 4)

        Returns:

            a dictionary where keys are k-mers and values are their relative abundances in the given list of sequences

        """
        relative_abundance = {}
        total_count = np.sum(counts)
        relative_abundance_per_sequence = counts / total_count
        for index, sequence in enumerate(sequences):
            kmers = KmerHelper.create_kmers_from_string(sequence, k)
            for kmer in kmers:
                if kmer not in relative_abundance or relative_abundance[kmer] < relative_abundance_per_sequence[index]:
                    relative_abundance[kmer] = relative_abundance_per_sequence[index]

        return relative_abundance

示例#3

显示文件

文件： Util.py 项目： uio-bmi/immuneML

    def compute_relative_abundance(sequences: np.ndarray, counts: np.ndarray, k: int) -> dict:
        """
        Computes the relative abundance of k-mers in the repertoire per following equations where C is the template count, T is the total count and
        RA is relative abundance (the output of the function for each k-mer separately):

        .. math::

            C^{kmer}=\\sum_{\\underset{with kmer}{TCR \\beta}} C^{TCR \\beta}

            T^{kmer} = \\sum_{kmer} C^{kmer}

            RA = \\frac{C^{kmer}}{T^{kmer}}

        For more details, please see the original publication: Ostmeyer J, Christley S, Toby IT, Cowell LG. Biophysicochemical motifs in T cell
        receptor sequences distinguish repertoires from tumor-infiltrating lymphocytes and adjacent healthy tissue. Cancer Res. Published online
        January 1, 2019:canres.2292.2018. `doi:10.1158/0008-5472.CAN-18-2292 <https://cancerres.aacrjournals.org/content/canres/79/7/1671.full.pdf>`_

        Arguments:

            sequences: an array of (amino acid) sequences (corresponding to a repertoire)
            counts: an array of counts for each of the sequences
            k: the length of the k-mer (in the publication referenced above, k is 4)

        Returns:

            a dictionary where keys are k-mers and values are their relative abundances in the given list of sequences

        """

        c_kmers = Counter()
        for index, sequence in enumerate(sequences):
            kmers = KmerHelper.create_kmers_from_string(sequence, k)
            c_kmers += {kmer: counts[index] for kmer in kmers}

        t_kmers = sum(c_kmers.values())

        return {kmer: c_kmers[kmer] / t_kmers for kmer in c_kmers.keys()}