INDEX
Explanations
references to women and their roles or characteristics
New Auto-Interp
Negative Logits
eing
-0.16
gaard
-0.15
Ìģ
-0.15
ذÙĩ
-0.14
leton
-0.14
posix
-0.14
grund
-0.14
emales
-0.14
Fi
-0.13
antar
-0.13
POSITIVE LOGITS
hood
0.17
Sharper
0.15
ityEngine
0.14
oi
0.14
AMI
0.14
lok
0.14
omaly
0.14
elijke
0.14
XP
0.14
agers
0.14
Activations Density 0.040%