INDEX
Explanations
mentions of women and references to their roles or relationships in various contexts
New Auto-Interp
Negative Logits
ikip
-0.15
Royale
-0.15
itet
-0.15
elier
-0.15
amura
-0.15
ÎŃ
-0.14
thon
-0.14
annon
-0.13
118
-0.13
ntag
-0.13
POSITIVE LOGITS
iral
0.19
alike
0.17
ç̬
0.14
CKET
0.13
chron
0.13
лиÑĪ
0.13
ãĤ¹ãĤ¿ãĥ¼
0.13
íķ
0.13
822
0.13
Desktop
0.13
Activations Density 0.013%