Python Vocabulary.get_id_from_tokenの例

プログラミング言語: Python

名前空間/パッケージ名: vocabulary

クラス/型: Vocabulary

メソッド/関数: get_id_from_token

hotexamples.comのコード掲載数: 2

Python Vocabulary.get_id_from_token - 2件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのvocabulary.Vocabulary.get_id_from_tokenの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

Vocabulary(30)

add_word(15)

clean_text(8)

build_vocab(8)

add_words(8)

deserialize(7)

compile(4)

add(4)

antonym(4)

auto_punctuate(3)

add_token(3)

encode(3)

add_from_file(2)

decode_output(2)

getUniGrams(2)

from_documents(2)

build_corpus(2)

getVocabularyByDocument(2)

getBiGrams(2)

get_id_from_token(2)

add_a_word(2)

add_text(2)

add_many(2)

getFullDict(2)

gen_DAG(1)

from_text_files(1)

from_text(1)

from_serializable(1)

from_sentences(1)

get(1)

add_constant(1)

getPTStopWords(1)

getQuestions(1)

getVocabularySize(1)

get_all_source_words(1)

get_all_translations(1)

get_pos(1)

get_term_text(1)

make_dictionary(1)

seg_content(1)

from_nlp_data(1)

encode_sent(1)

from_idx2word_dict(1)

convert_sentence(1)

add_new_word(1)

add_sentence(1)

add_chunk(1)

add_word_lst(1)

append(1)

build(1)

コード例 #1

ファイルを表示

ファイル: gen_yahoo.py プロジェクト: zhangyafeikimi/ml-pack

        print >> sys.stderr, '%s [stop word file] [output name] ' \
                             '[doc file] ...' % sys.argv[0]
        sys.exit(1)

    file_list = []
    for _dir in sys.argv[3:]:
        collect_files(file_list, _dir)

    stop_word = Vocabulary()
    stop_word.load(sys.argv[1])
    vocab = Vocabulary()
    articles = []

    for filename in file_list:
        article = stem_file(filename, vocab, stop_word)
        articles.append(article)
    # random.shuffle(articles)

    vocab.sort()
    vocab.save(sys.argv[2] + '-vocab')

    fp = open(sys.argv[2] + '-train', 'w')
    for article in articles:
        sb = ''
        for word in article:
            sb += '%d ' % vocab.get_id_from_token(word)
        sb = sb.rstrip()
        fp.write(sb)
        fp.write('\n')
    fp.close()

コード例 #2

ファイルを表示

ファイル: lda_doc_proc.py プロジェクト: kurff/ml-lda

                    count = word_count.get(stemmed_word, 0) + 1
                    word_count[stemmed_word] = count
    infile.close()
    return word_count

if __name__ == '__main__':
    if len(sys.argv) <= 2:
        print >>sys.stderr, '%s [stop word file] [doc file] ...' % sys.argv[0]
        sys.exit(1)

    stop_word = Vocabulary()
    stop_word.load(sys.argv[1])
    vocab = Vocabulary()
    word_count_list = []

    for filename in sys.argv[2:]:
        word_count = stem_file(filename, vocab, stop_word)
        word_count_list.append(word_count)
    vocab.sort()
    vocab.save('train.vocab')

    fp = open('train', 'w')
    for word_count in word_count_list:
        for word in word_count.keys():
            id = vocab.get_id_from_token(word)
            count = word_count[word]
            fp.write('%d:%d ' % (id, count))
        fp.write('\n')
    fp.close()