INDEX
Explanations
frequent references to specific names or proper nouns
names of individuals and references to dolls
New Auto-Interp
Negative Logits
rament
-0.89
gers
-0.87
AMI
-0.82
riors
-0.80
uld
-0.74
lar
-0.74
raltar
-0.71
rid
-0.71
arching
-0.70
lopp
-0.69
POSITIVE LOGITS
ipop
0.85
BACK
0.71
yp
0.67
yk
0.67
ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
0.66
phia
0.64
skirt
0.61
Kers
0.60
oning
0.60
skirts
0.60
Activations Density 0.123%