INDEX
Explanations
references to the name "Mary" or variations thereof
New Auto-Interp
Negative Logits
kova
-0.18
abbage
-0.17
zew
-0.16
ayet
-0.15
FAQ
-0.15
tega
-0.15
staw
-0.14
hower
-0.14
claims
-0.14
otive
-0.14
POSITIVE LOGITS
ann
0.24
mount
0.22
ellen
0.22
sville
0.21
umm
0.20
kn
0.20
Ann
0.20
beth
0.20
trs
0.19
Mag
0.19
Activations Density 0.008%