INDEX
Explanations
images with captions
the word "this"
New Auto-Interp
Negative Logits
Sword
-0.67
NAS
-0.67
76561
-0.67
APD
-0.66
Vand
-0.66
ORED
-0.65
Nadu
-0.65
Sund
-0.65
Rounds
-0.65
uality
-0.64
POSITIVE LOGITS
toggle
0.87
WATCHED
0.77
image
0.74
CASE
0.73
transcript
0.73
month
0.72
ebin
0.72
ARTICLE
0.71
page
0.69
slide
0.68
Activations Density 0.020%