Alibaba-NLP/gme-Qwen2-VL-7B-Instruct · Question about the Figure 5 in ArXiv

BestWishYsh

Dec 26, 2024

•

edited Dec 26, 2024

@zyznull @thenlper @izhx Great work!!!! But I have the following questions:

Can you provide the code for calculating t-SNE?
It would be better if you can additionally visualize the t-SNE of original Qwen2-VL. (instead of solely GME)

BestWishYsh

Dec 26, 2024

@zyznull @thenlper @izhx What is the perplexity setting when calculating t-SNE? (perplexity=1?)

zyznull

Alibaba-NLP org Dec 27, 2024

We utilized the t-SNE method provided by sklearn and maintained the default parameters. The code is as follows:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE

X = np.load('emb.np')
plt.rcParams['ytick.labelsize'] = 24
plt.rcParams['xtick.labelsize'] = 24

y = np.concatenate([[0] * 1000, [1] * 1000, [2]*1000])

tsne = TSNE(n_components=2, random_state=42)
X_tsne = tsne.fit_transform(X_pca)

colors = ['g', 'b', 'r']
markers = ['o', 's', '^']
fig, ax = plt.subplots(figsize=(10, 8))

colors = ['g', 'b', 'r']
markers = ['o', 's', '^']

ax.spines['top'].set_color('black')
ax.spines['bottom'].set_color('black')
ax.spines['left'].set_color('black')
ax.spines['right'].set_color('black')

names = ['Text', 'Image', 'Text+Image']
for i, c, m in zip(range(3), colors, markers):
    plt.scatter(X_tsne[y == i, 0], X_tsne[y == i, 1], color=c, marker=m, label=names[i], alpha=1.0)
plt.xticks([])
plt.yticks([])
ax.legend(fontsize=24,loc='lower right')
plt.show()

BestWishYsh

Dec 27, 2024

@zyznull Could you provide the full code? And how to get X_pca and how to use from sklearn.decomposition import PCA, thanks!

BestWishYsh

Dec 27, 2024

And I am confused about how to obtain the text-image fusion features of CLIP, since its text encoder and image encoder are separate.