Python tsne_viz示例

编程语言: Python

命名空间/包名称: vsm

方法/功能: tsne_viz

hotexamples.com的示例: 2

Python tsne_viz - 已找到2个示例。这些是从开源项目中提取的最受好评的vsm.tsne_viz现实Python示例。您可以评价示例，以帮助我们提高示例质量。

示例#1

显示文件

def test_tsne_viz(df):
    vsm.tsne_viz(df)

示例#2

显示文件

文件： vsm_01_distributional_trials1.py 项目： abgoswam/cs224u

# * You can begin to get a feel for what your matrix is like by poking around with `vsm.neighbors` to see who is close to or far from whom.
#
# * It's very useful to complement this with the more holistic view one can get from looking at a visualization of the entire vector space.
#
# * Of course, any visualization will have to be much, much lower dimension than our actual VSM, so we need to proceed cautiously, balancing the high-level view with more fine-grained exploration.
#
# * We won't have time this term to cover VSM visualization in detail. scikit-learn has a bunch of functions for doing this in [sklearn.manifold](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.manifold), and the [user guide](http://scikit-learn.org/stable/modules/manifold.html#manifold-learning) for that package is detailed.
#
# * It's also worth checking out the online TensorFlow [Embedding Projector tool](http://projector.tensorflow.org), which includes a fast implementation of t-SNE.
#
# * In addition, `vsm.tsne_viz` is a wrapper around [sklearn.manifold.TSNE](http://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html#sklearn.manifold.TSNE) that handles the basic preprocessing and layout for you. t-SNE stands for [t-Distributed Stochastic Neighbor Embedding](http://jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf), a powerful method for visualizing high-dimensional vector spaces in 2d. See also [Multiple Maps t-SNE](https://lvdmaaten.github.io/multiplemaps/Multiple_maps_t-SNE/Multiple_maps_t-SNE.html).

# In[43]:

vsm.tsne_viz(imdb20_pmi, random_state=42)

# ## Exploratory exercises
#
# These are largely meant to give you a feel for the material, but some of them could lead to projects and help you with future work for the course. These are not for credit.
#
# 1. Recall that there are two versions each of the IMDB and Gigaword matrices: one with window size 5 and counts scaled as $1/d$ where $d$ is the distance from the target word; and one with a window size of 20 and no scaling of the values. Using `vsm.neighbors` to explore, how would you describe the impact of these different designs?
#
# 1. IMDB and Gigaword are very different domains. Using `vsm.neighbors`, can you find cases where the dominant sense of a word is clearly different in the two domains in a way that is reflected by vector-space proximity?
#
# 1. We saw that euclidean distance favors raw frequencies. Find words in the matrix `imdb20` that help make this point: a pair that are semantically unrelated but close according to `vsm.euclidean`, and a pair that are semantically related by far apart according to `vsm.euclidean`.
#
# 1. Run
#
# ```amod = pd.read_csv(os.path.join(DATA_HOME, 'gigawordnyt-advmod-matrix.csv.gz'), index_col=0)```