INDEX
Explanations
captions in images
mentions of image captions and their corresponding attributes
New Auto-Interp
Negative Logits
<|endoftext|>
-0.74
aturdays
-0.69
akespeare
-0.68
onest
-0.67
bom
-0.61
recl
-0.61
territ
-0.60
ury
-0.60
stab
-0.59
reborn
-0.59
POSITIVE LOGITS
GOODMAN
0.92
Gladiator
0.75
Phys
0.75
UTERS
0.73
Javascript
0.68
IMAGES
0.65
Mandatory
0.65
Immun
0.64
Ability
0.63
itars
0.63
Activations Density 0.094%