INDEX
Explanations
mentions of diverse images or visual media
New Auto-Interp
Negative Logits
edList
-0.21
lei
-0.18
edly
-0.17
edImage
-0.17
rees
-0.16
riad
-0.16
enant
-0.16
licht
-0.15
ugs
-0.15
ugh
-0.14
POSITIVE LOGITS
0.23
axe
0.22
colo
0.22
asso
0.18
-per
0.17
ardon
0.17
quet
0.16
perfect
0.16
kee
0.16
dum
0.16
Activations Density 0.008%