Python Vocabulary.getVocabularyByDocumentの例

プログラミング言語: Python

名前空間/パッケージ名: vocabulary

クラス/型: Vocabulary

メソッド/関数: getVocabularyByDocument

hotexamples.comのコード掲載数: 2

Python Vocabulary.getVocabularyByDocument - 2件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのvocabulary.Vocabulary.getVocabularyByDocumentの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

Vocabulary(30)

add_word(15)

clean_text(8)

build_vocab(8)

add_words(8)

deserialize(7)

compile(4)

add(4)

antonym(4)

auto_punctuate(3)

add_token(3)

encode(3)

add_from_file(2)

decode_output(2)

getUniGrams(2)

from_documents(2)

build_corpus(2)

getVocabularyByDocument(2)

getBiGrams(2)

get_id_from_token(2)

add_a_word(2)

add_text(2)

add_many(2)

getFullDict(2)

gen_DAG(1)

from_text_files(1)

from_text(1)

from_serializable(1)

from_sentences(1)

get(1)

add_constant(1)

getPTStopWords(1)

getQuestions(1)

getVocabularySize(1)

get_all_source_words(1)

get_all_translations(1)

get_pos(1)

get_term_text(1)

make_dictionary(1)

seg_content(1)

from_nlp_data(1)

encode_sent(1)

from_idx2word_dict(1)

convert_sentence(1)

add_new_word(1)

add_sentence(1)

add_chunk(1)

add_word_lst(1)

append(1)

build(1)

コード例 #1

ファイルを表示

    def text2vec(self, train_data):

        idx_vocab = {word: idx for idx, word in enumerate(self.vocabulary)}

        documents = []
        for document in tqdm(train_data.x_data):

            doc_vocab = Vocabulary.getVocabularyByDocument(
                document, self.grams)
            occurrences = np.zeros(len(idx_vocab.keys()))

            for feature in doc_vocab.keys():

                try:
                    occurrences[idx_vocab[feature]] += doc_vocab[feature]
                except:
                    continue

            documents.append(occurrences)

        return documents

コード例 #2

ファイルを表示

    def predict(self, x_test):

        predictions = []
        prod_probs = []
        smoothing = self.smoothing if self.smoothing else 0

        for document in x_test:
            # Get the vocabulary of the document
            vocab = Vocabulary.getVocabularyByDocument(document, self.grams)
            probs = {}

            for cl in np.unique(self.y_train):
                # For each class calculate the class probability
                prob = np.log(self.model['{}_Pc'.format(cl)])

                for features in vocab.keys():
                    # Calculate probability of each feature
                    try:
                        freq = self.model['{}_occr'.format(cl)][features]
                    except:
                        freq = 0
                        #continue

                    total = self.model['{}_tot'.format(cl)]
                    sizeV = len(self.model['{}_occr'.format(cl)])

                    calc = np.log(freq +
                                  smoothing) if freq + smoothing > 0 else 0

                    prob += vocab[features] * calc

                probs[cl] = prob - sum(
                    vocab.values()) * np.log(total + sizeV * smoothing)

            predictions.append(max(probs, key=probs.get))
            prod_probs.append(probs)

        return predictions