Python phraseMapping 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: buildBilingualDict

메소드/함수: phraseMapping

hotexamples.com에서의 예제들: 3

Python phraseMapping - 3개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 buildBilingualDict.phraseMapping에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: preprocess_qc.py 프로젝트: xiayandi/Bilingual-Sentence-Classification

def word2phrase_filelevel(formatCorpusFile, depFile, outputPhraseCorpusFile, outputPhraseDepFile):
    """
    func: transfer word based corpus into phrase based one
    :param formatCorpusFile: the file that needs to be transfered
    :param depFile: the corresponding dependency file
    :param outputPhraseCorpusFile: the transfered phrase based output file
    :param outputPhraseDepFile: the transfered phrase based dependency file
    :return: n/a
    """
    phrasemap = buildBilingualDict.phraseMapping('../data/phrase.lst')
    sentences_triples = readInDependencyTriples(depFile)
    sentences, clbls, flbls = get_english_raw_sentences_labels(formatCorpusFile)
    assert len(sentences) == len(sentences_triples)

    newDepinfo = []
    newCorpus = []

    for i, sent in enumerate(sentences):
        sent_triples = sentences_triples[i]
        newdeps = mergeDependencyTree(sent_triples, sent, phrasemap)
        newsent = word2phrase_sentencelevel(sent, phrasemap)
        newDepinfo.append(formDependencyTripleLine(newdeps))
        newCorpus.append(flbls[i]+'\t'+newsent+'\n')

    with open(outputPhraseCorpusFile, 'w') as writer:
        writer.writelines(newCorpus)
    with open(outputPhraseDepFile, 'w') as writer:
        writer.writelines(newDepinfo)

예제 #2

파일 보기

파일: preprocess_qc.py 프로젝트: xiayandi/Bilingual-Sentence-Classification

def word2phrase_filelevel(formatCorpusFile, depFile, outputPhraseCorpusFile,
                          outputPhraseDepFile):
    """
    func: transfer word based corpus into phrase based one
    :param formatCorpusFile: the file that needs to be transfered
    :param depFile: the corresponding dependency file
    :param outputPhraseCorpusFile: the transfered phrase based output file
    :param outputPhraseDepFile: the transfered phrase based dependency file
    :return: n/a
    """
    phrasemap = buildBilingualDict.phraseMapping('../data/phrase.lst')
    sentences_triples = readInDependencyTriples(depFile)
    sentences, clbls, flbls = get_english_raw_sentences_labels(
        formatCorpusFile)
    assert len(sentences) == len(sentences_triples)

    newDepinfo = []
    newCorpus = []

    for i, sent in enumerate(sentences):
        sent_triples = sentences_triples[i]
        newdeps = mergeDependencyTree(sent_triples, sent, phrasemap)
        newsent = word2phrase_sentencelevel(sent, phrasemap)
        newDepinfo.append(formDependencyTripleLine(newdeps))
        newCorpus.append(flbls[i] + '\t' + newsent + '\n')

    with open(outputPhraseCorpusFile, 'w') as writer:
        writer.writelines(newCorpus)
    with open(outputPhraseDepFile, 'w') as writer:
        writer.writelines(newDepinfo)

예제 #3

파일 보기

파일: buildBilingualCorpus.py 프로젝트: xiayandi/Bilingual-Sentence-Classification

def preprocessEnglishCorpus(CorpusFile, preprocessedFile):
    """
    func: preprocess corpus
    params: CorpusFile: corpus file path
    params: preprocessedFile: the output preprocessed file
    return: n/a
    """
    print 'preprocessing...'
    reader = open(CorpusFile, 'r')
    buffsize = 250000000
    buffcount = 0
    open(preprocessedFile, 'w').close()
    phrasemap = buildBilingualDict.phraseMapping('../data/phrase.lst')
    while True:
        outputbuffer = []
        lines = reader.readlines(buffsize)
        if not lines:
            break
        else:
            buffcount += 1
            print 'building with ' + str(buffcount) + ' buffer.....'
        for line in lines:
            words = line.split()
            newwords = []
            for word in words:
                newwords.extend(english_word_filter(word))
            newwords = english_phrase_filter(newwords, phrasemap)
            outputbuffer.append(' '.join(newwords) + '\n')
        print 'writing buffer...'
        with open(preprocessedFile, 'a') as writer:
            writer.writelines(outputbuffer)
    reader.close()