INDEX
Explanations
names, particularly female names
New Auto-Interp
Negative Logits
lemn
-0.17
upo
-0.15
ÐĿаÑģ
-0.15
sty
-0.15
oria
-0.14
.nlm
-0.14
rol
-0.14
æĭ³
-0.14
пÑĢавда
-0.14
als
-0.13
POSITIVE LOGITS
abouts
0.16
Thumb
0.15
eck
0.15
damned
0.14
&)↵
0.14
okes
0.14
γε
0.14
å¦Ļ
0.14
ãĤ¡
0.14
leÅŁik
0.14
Activations Density 0.021%