Python GensimCorpus 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: model.create_corpus

클래스/타입: GensimCorpus

hotexamples.com에서의 예제들: 6

Python GensimCorpus - 6개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 model.create_corpus.GensimCorpus에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

GensimCorpus(3)

createCorpus(1)

예제 #1

파일 보기

파일: main.py 프로젝트: royshan/gensimLite

def dictionaryGen(data_fp, dict_fp):
    '''
    generates a gensim dictionary
    and saves to dict_fp
    '''
    g = GensimCorpus(data_fp)
    # loads data object as json and converts to tuple
    g.loadjson().json2tuple()
    # tokenize data
    g.tokenizeData()
    # create dictionary and filter lower word frequency
    print 'creating dictionary...'
    g.createDictionary().filterFrequency(n=1)
    # save dictionary
    print 'saving dictionary to %s' % dict_fp
    g.saveDictionary(dict_fp)

예제 #2

파일 보기

파일: main.py 프로젝트: PeggedSoftware/gensimLite

def dictionaryGen(data_fp, dict_fp):
    '''
    generates a gensim dictionary
    and saves to dict_fp
    '''
    g = GensimCorpus(data_fp)
    # loads data object as json and converts to tuple
    g.loadjson().json2tuple()
    # tokenize data
    g.tokenizeData()
    # create dictionary and filter lower word frequency
    print 'creating dictionary...'
    g.createDictionary().filterFrequency(n=1)
    # save dictionary
    print 'saving dictionary to %s' % dict_fp
    g.saveDictionary(dict_fp)

예제 #3

파일 보기

파일: main.py 프로젝트: PeggedSoftware/gensimLite

def ldaGen(dict_fp, corpus_fp, model_fp,
           streamParameters=None, batchParameters=None):
    g = GensimCorpus()
    dictionary = g.loadDictionary(dict_fp)
    corpus = g.loadCorpus(corpus_fp)
    lda = LdaModel(corpus, dictionary)

    # get params
    if streamParameters:
        params = lda.streamParams(**streamParameters)
    elif batchParameters:
        params = lda.batchParams(**batchParameters)
    else:
        print 'please specify streaming or batch lda'

    # set params and run model
    lda.setParams(params)
    lda.runLda()
    lda.save_model(model_fp)
    return lda

예제 #4

파일 보기

파일: main.py 프로젝트: royshan/gensimLite

def ldaGen(dict_fp,
           corpus_fp,
           model_fp,
           streamParameters=None,
           batchParameters=None):
    g = GensimCorpus()
    dictionary = g.loadDictionary(dict_fp)
    corpus = g.loadCorpus(corpus_fp)
    lda = LdaModel(corpus, dictionary)

    # get params
    if streamParameters:
        params = lda.streamParams(**streamParameters)
    elif batchParameters:
        params = lda.batchParams(**batchParameters)
    else:
        print 'please specify streaming or batch lda'

    # set params and run model
    lda.setParams(params)
    lda.runLda()
    lda.save_model(model_fp)
    return lda

예제 #5

파일 보기

파일: main.py 프로젝트: royshan/gensimLite

def corpusGen(data_fp, dict_fp, corpus_fp):
    '''
    generates a gensim corpus using
    gensim dictionary file
    '''
    # load data
    g = GensimCorpus(data_fp)
    g.loadjson().json2tuple()
    # tokenize data
    g.tokenizeData()
    # load dictionary
    g.loadDictionary(dict_fp)
    # create corpus
    print 'creating corpus...'
    g.createCorpus([text for tag, text in g.data])
    # save corpus
    print 'saving corpus to %s' % dict_fp
    g.saveCorpus(corpus_fp)

예제 #6

파일 보기

파일: main.py 프로젝트: PeggedSoftware/gensimLite

def corpusGen(data_fp, dict_fp, corpus_fp):
    '''
    generates a gensim corpus using
    gensim dictionary file
    '''
    # load data
    g = GensimCorpus(data_fp)
    g.loadjson().json2tuple()
    # tokenize data
    g.tokenizeData()
    # load dictionary
    g.loadDictionary(dict_fp)
    # create corpus
    print 'creating corpus...'
    g.createCorpus([text for tag, text in g.data])
    # save corpus
    print 'saving corpus to %s' % dict_fp
    g.saveCorpus(corpus_fp)