Exemplo n.º 1
0
def test_torch_color_describer_save_load(dataset):
    color_seqs, word_seqs, vocab = dataset
    mod = ContextualColorDescriber(vocab,
                                   embed_dim=10,
                                   hidden_dim=10,
                                   max_iter=100,
                                   embedding=None)
    mod.fit(color_seqs, word_seqs)
    mod.predict(color_seqs)
    with tempfile.NamedTemporaryFile(mode='wb') as f:
        name = f.name
        mod.to_pickle(name)
        mod2 = ContextualColorDescriber.from_pickle(name)
        mod2.predict(color_seqs)
        mod2.fit(color_seqs, word_seqs)
Exemplo n.º 2
0
# In[ ]:

_ = dev_mod.fit(dev_cols_train, dev_seqs_train)

# As discussed in [colors_overview.ipynb](colors_overview.ipynb), our primary metric is `listener_accuracy`:

# In[ ]:

dev_mod.listener_accuracy(dev_cols_test, dev_seqs_test)

# We can also see the model's predicted sequences given color context inputs:

# In[ ]:

dev_mod.predict(dev_cols_test[:1])

# In[ ]:

dev_seqs_test[:1]

# ## Question 3: GloVe embeddings [1 points]
#
# The above model uses a random initial embedding, as configured by the decoder used by `ContextualColorDescriber`. This homework question asks you to consider using GloVe inputs.
#
# __Your task__: Complete `create_glove_embedding` so that it creates a GloVe embedding based on your model vocabulary. This isn't mean to be analytically challenging, but rather just to create a basis for you to try out other kinds of rich initialization.

# In[ ]:

GLOVE_HOME = os.path.join('data', 'glove.6B')
Exemplo n.º 3
0
    batch_size=128,
    l2_strength=0.0,
    warm_start=False,
    device=None)

# In[31]:

_ = toy_mod.fit(toy_color_seqs_train, toy_word_seqs_train)

# ### Predicting sequences

# The `predict` method takes a list of color contexts as input and returns model descriptions:

# In[32]:

toy_preds = toy_mod.predict(toy_color_seqs_test)

# In[33]:

toy_preds[0]

# We can then check that we predicted all correct sequences:

# In[34]:

toy_correct = sum(1 for x, p in zip(toy_word_seqs_test, toy_preds))

toy_correct / len(toy_word_seqs_test)

# For real problems, this is too stringent a requirement, since there are generally many equally good descriptions. This insight gives rise to metrics like [BLEU](https://en.wikipedia.org/wiki/BLEU), [METEOR](https://en.wikipedia.org/wiki/METEOR), [ROUGE](https://en.wikipedia.org/wiki/ROUGE_(metric)), [CIDEr](https://arxiv.org/pdf/1411.5726.pdf), and others, which seek to relax the requirement of an exact match with the test sequence. These are reasonable options to explore, but we will instead adopt a communcation-based evaluation, as discussed in the next section.