Python RegexpTokenizer.init Examples

Programming Language: Python

Namespace/Package Name: nltk.tokenize.regexp

Class/Type: RegexpTokenizer

Method/Function: __init__

Examples at hotexamples.com: 3

Python RegexpTokenizer.__init__ - 3 examples found. These are the top rated real world Python examples of nltk.tokenize.regexp.RegexpTokenizer.__init__ extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

RegexpTokenizer(30)

tokenize(30)

span_tokenize(2)

__init__(1)

Example #1

Show file

File: morpho.py Project: marlovss/unsupmorpho

 def __init__(self):
     tokenizer_regexp = r'''(?ux)
           # the order of the patterns is important!!
           # more structured patterns come first
           [a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)+|    # emails
           (?:https?://)?\w{2,}(?:\.\w{2,})+(?:/\w+)*|                  # URLs
           (?:[\#@]\w+)|                     # Hashtags and twitter user names
           (?:[^\W\d_]\.)+|                  # one letter abbreviations, e.g. E.U.A.
           (?:[DSds][Rr][Aa]?)\.|            # common abbreviations such as dr., sr., sra., dra.
           (?:\B-)?\d+(?:[:.,]\d+)*(?:-?\w)*|
           # numbers in format 999.999.999,999, possibly followed by hyphen and alphanumerics
           # \B- avoids picks as F-14 as a negative number
           \.{3,}|                           # ellipsis or sequences of dots
           \w+|                              # alphanumerics
           -+|                               # any sequence of dashes
           \S                                # any non-space character
    '''
     RegexpTokenizer.__init__(self, tokenizer_regexp)

Example #2

Show file

File: nlp.py Project: colgur/reader_pipeline

 def __init__(self):
    RegexpTokenizer.__init__(self, r'[\w+#]+|[^\w\s]+')

Example #3

Show file

File: text.py Project: bmuller/readembedability

 def __init__(self):
     RegexpTokenizer.__init__(self, r'\w+[-\w+]*|[^\w\s]+')

Python RegexpTokenizer.__init__ Examples

Python RegexpTokenizer.init Examples