Python build_rnn_dataset 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: sst

메소드/함수: build_rnn_dataset

hotexamples.com에서의 예제들: 4

Python build_rnn_dataset - 4개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 sst.build_rnn_dataset에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: sst_03_neural_networks_trials1.py 프로젝트: abgoswam/cs224u

# y     &= \textbf{softmax}(h_{n}W_{hy} + b)
# \end{align*}$$
#
# where $1 \leqslant t \leqslant n$. As indicated in the above diagram, the sequence of hidden states is padded with an initial state $h_{0}$ In our implementations, this is always an all $0$ vector, but it can be initialized in more sophisticated ways (some of which we will explore in our unit on natural language inference).
#
# This is a potential gain over our sum-the-word-vectors baseline, in that it processes each word independently, and in the context of those that came before it. Thus, not only is this sensitive to word order, but the hidden representation give us the potential to encode how the preceding context for a word affects its interpretation.
#
# The downside of this, of course, is that this model is much more difficult to set up and optimize. Let's dive into those details.

# ### RNN dataset preparation
#
# SST contains trees, but the RNN processes just the sequence of leaf nodes. The function `sst.build_rnn_dataset` creates datasets in this format:

# In[16]:

X_rnn_train, y_rnn_train = sst.build_rnn_dataset(
    SST_HOME, sst.train_reader, class_func=sst.ternary_class_func)

# Each member of `X_rnn_train` is a list of lists of words. Here's a look at the start of the first:

# In[17]:

X_rnn_train[0][:6]

# Because this is a classifier, `y_rnn_train` is just a list of labels, one per example:

# In[18]:

y_rnn_train[0]

# For experiments, let's build a `dev` dataset as well:

예제 #2

파일 보기

def test_build_rnn_dataset():
    split_df = sst.dev_reader(sst_home)
    X, y = sst.build_rnn_dataset(split_df)
    assert len(X) == 1101
    assert len(y) == 1101

예제 #3

파일 보기

파일: test_sst.py 프로젝트: marcelajanowska/MIT_NLU_Course_cs224u

def test_build_rnn_dataset():
    X, y = sst.build_rnn_dataset(sst_home,
                                 sst.train_reader,
                                 class_func=sst.binary_class_func)
    assert len(X) == 6920
    assert len(y) == 6920

예제 #4

파일 보기

파일: train_bert.py 프로젝트: bpm72/cs224u

import utils

if __name__ == '__main__':

    logger = get_logger()

    set_seed(3)
    
    SST_HOME = os.path.join('', 'trees')

    train_df = pd.read_csv(TRAIN_FILE, encoding='utf-8', sep='\t')
    test_df = pd.read_csv(DEV_FILE, encoding='utf-8', sep='\t')
    bert_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

    
    X_train_lst, y_train_txt = sst.build_rnn_dataset( SST_HOME, sst.train_reader, class_func=sst.binary_class_func)
    X_train = [' '.join(X_train_lst[index]) for index in range(len(X_train_lst))]

    y_train = []
    for label in y_train_txt:
        if label =='positive':
            y_train.append(1)
        else:
            y_train.append(0)

    train_df = pd.DataFrame({'sentence':X_train, 'label':y_train})

    X_test_lst, y_test_txt = sst.build_rnn_dataset( SST_HOME, sst.test_reader, class_func=sst.binary_class_func)
    X_test = [' '.join(X_test_lst[index]) for index in range(len(X_test_lst))]

    y_test = []