Python RegexBuilder Examples

Programming Language: Python

Namespace/Package Name: gtts.tokenizer

Class/Type: RegexBuilder

Examples at hotexamples.com: 5

Python RegexBuilder - 5 examples found. These are the top rated real world Python examples of gtts.tokenizer.RegexBuilder extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

RegexBuilder(5)

Frequently Used Methods

RegexBuilder (5)

Example #1

Show file

def tone_marks():
    """Keep tone-modifying punctuation by matching following character.

    Assumes the `tone_marks` pre-processor was run for cases where there might
    not be any space after a tone-modifying punctuation mark.
    """
    return RegexBuilder(pattern_args=symbols.TONE_MARKS,
                        pattern_func=lambda x: u"(?<={}).".format(x)).regex

Example #2

Show file

def legacy_all_punctuation(
):  # pragma: no cover b/c tested but Coveralls: ¯\_(ツ)_/¯
    """Match all punctuation.

    Use as only tokenizer case to mimic gTTS 1.x tokenization.
    """
    punc = symbols.ALL_PUNC
    return RegexBuilder(pattern_args=punc,
                        pattern_func=lambda x: u"{}".format(x)).regex

Example #3

Show file

def colon():
    """Colon case.

    Match a colon ":" only if not preceeded by a digit.
    Mainly to prevent a cut in the middle of time notations e.g. 10:01

    """
    return RegexBuilder(pattern_args=symbols.COLON,
                        pattern_func=lambda x: r"(?<!\d){}".format(x)).regex

Example #4

Show file

def other_punctuation():
    """Match other punctuation.

    Match other punctuation to split on; punctuation that naturally
    inserts a break in speech.

    """
    punc = ''.join((set(symbols.ALL_PUNC) - set(symbols.TONE_MARKS) -
                    set(symbols.PERIOD_COMMA)))
    return RegexBuilder(pattern_args=punc,
                        pattern_func=lambda x: u"{}".format(x)).regex

Example #5

Show file

def period_comma():
    """Period and comma case.

    Match if not preceded by ".<letter>" and only if followed by space.
    Won't cut in the middle/after dotted abbreviations; won't cut numbers.

    Note:
        Won't match if a dotted abbreviation ends a sentence.

    Note:
        Won't match the end of a sentence if not followed by a space.

    """
    return RegexBuilder(
        pattern_args=symbols.PERIOD_COMMA,
        pattern_func=lambda x: u"(?<!\.[a-z]){} ".format(x)).regex