Esempi in Python per TextCleaner

Linguaggio di programmazione: Python

Spazio dei nomi/nome del pacchetto: text_cleaner.text_cleaner

Classe/tipologia: TextCleaner

Esempi su hotexamples.com: 2

TextCleaner in Python: 2 esempi trovati. Questi sono i migliori esempi reali in Python per text_cleaner.text_cleaner.TextCleaner, estratti da progetti open source. Li puoi valutare, per aiutarci a migliorare la qualità dei nostri esempi.

Metodi utilizzati di frequente

Mostra Nascondi

clean(1)

Esempio n. 1

Mostra file

File: gram_freq.py Progetto: ytiralk01/collocation_discovery

class GramFreq:
    """provides the utility for basic corpus analytics; also supports advanced collocation mining abilities
    """
    def __init__(self, n):
        """tracks the frequency distribution; n is the length of the desired grams to be computed and indexed
        """
        self.n = n
        self.freq = {}
        self.text_cleaner = TextCleaner()

    def index(self, document):
        """tokenizes a document, computes n-grams from that token stream and moves the computed n-grams to the freq
        """
        # clean and tokenize the incoming text
        tokens = filter(lambda x: x not in stopwords, gram_tokenize(self.text_cleaner.clean(document)))
        # create sequence of n-grams; n is argument
        grams = set([i for i in ngrams(tokens, self.n)])
        for gram in grams:
            self.freq[' '.join(gram)] = self.freq.get(' '.join(gram), 0) + 1

    def dump(self, filename):
        """dumps the computed freq dict to disk as a JSON string
        """
        with open(filename, 'w') as outfile:
            json.dump(self.freq, outfile)

    def load(self, filename):
        """loads a previously computed freq dict from disk and use in analyses, etc.
        """
        with open(filename, 'r') as infile:
            self.freq = simplejson.loads(infile.read())

Esempio n. 2

Mostra file

File: gram_freq.py Progetto: ytiralk01/collocation_discovery

 def __init__(self, n):
     """tracks the frequency distribution; n is the length of the desired grams to be computed and indexed
     """
     self.n = n
     self.freq = {}
     self.text_cleaner = TextCleaner()