INDEX
Explanations
references to dolls and their cultural significance
New Auto-Interp
Negative Logits
engr
-0.15
á»ijn
-0.15
plib
-0.15
pis
-0.15
Snake
-0.15
oga
-0.14
_DX
-0.14
egers
-0.14
ñana
-0.14
badge
-0.14
POSITIVE LOGITS
dolls
0.52
doll
0.51
Doll
0.46
doll
0.39
toy
0.30
toys
0.29
puppet
0.26
toy
0.24
plush
0.23
figures
0.23
Activations Density 0.053%