Python remove_punctuation 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: functions

메소드/함수: remove_punctuation

hotexamples.com에서의 예제들: 5

Python remove_punctuation - 5개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 functions.remove_punctuation에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: data_access.py 프로젝트: caglanakpinar/seo-nlp-page-rank-word-embeddings

def get_music_bio(params):
    files = pd.read_csv(params['csv_path']).to_dict('resutls')
    params['D'] = params['D'] if params['D'] else len(files)
    all_word_counts = {}
    for f in files[:params['D']]:
        line = f['content']
        s = remove_punctuation(line).lower().split()
        if len(s) > 1:
            for word in s:
                if word not in all_word_counts:
                    all_word_counts[word] = 0
                else:
                    all_word_counts[word] += 1
    params['V'] = params['V'] if params['V'] else len(all_word_counts)
    V = min(params['V'], len(all_word_counts))
    all_word_counts_idx = all_word_counts
    all_word_counts = sorted(all_word_counts.items(), key=lambda x: x[1], reverse=True)
    top_words = [w for w, count in all_word_counts[:V-1]] + ['<UNK>']
    word2idx = {w:i for i, w in enumerate(top_words)}
    all_word_counts_idx = {ind: all_word_counts_idx[w] if w != '<UNK>' else 0 for ind, w in enumerate(word2idx)}
    print("finished counting")
    unk = word2idx['<UNK>']
    sents = []
    sentences = []
    for f in files[:params['D']]:
        content = f['content']
        for sentence in content.split("."):
            sentence = remove_punctuation(sentence).lower()
            if len(sentence.split()) > 1:
                sent = [word2idx[w] if w in word2idx and w != ' ' else unk for w in sentence.split()]
                sentences.append(sentence)
                sents.append(sent)
    return sentences, sents, word2idx, all_word_counts_idx, params

예제 #2

파일 보기

def main():
    #This file contains task 1.1 - 1.6

    f = codecs.open(text_file, "r", "utf-8")
    paragraphs = functions.makeParagraphArray(f)

    #Removes "gutenberg" and makes a copy of the paragraph
    paragraphs = functions.remove_specific_word("Gutenberg", paragraphs)
    paragraphs = functions.remove_specific_word("gutenberg", paragraphs)
    par_copy = copy.copy(paragraphs)

    paragraphs = functions.tokenize(paragraphs)
    paragraphs = functions.remove_punctuation(paragraphs)
    paragraphs = functions.stem(paragraphs)

    return par_copy, paragraphs

예제 #3

파일 보기

파일: wordcount_transform.py 프로젝트: online-developer/git-test

def main(sc, argv): 
    filename = argv[1]
    # threshold = int(argv[2])

    dfTextFile = sc.read.text(filename)
    wordCount = dfTextFile \
                 .select(explode(split(dfTextFile.value, ' ')).alias('word')) \
                 .transform(udfStr.remove_punctuation('word')) \
                 .groupBy('word') \
                 .count() \
                 .collect()
    print('-' * 50)
    # wordCount.select('word').show()

    for w in sorted(wordCount, key=lambda x: x[1]):
        print(w)

    print('-' * 50)

예제 #4

파일 보기

파일: test_functions.py 프로젝트: emejiatr/Cogs-18-Chatbot

def remove_test():
    assert remove_punctuation('!!!Hello!@#?') == 'Hello'

예제 #5

파일 보기

파일: test_functions.py 프로젝트: luluzhu9/COGS-18-Final-Project

def test_remove_punctuation():
    assert callable(remove_punctuation)
    assert remove_punctuation("hEllO,hOware!yOU") == "hellohowareyou"