INDEX
Explanations
prominent female figures or references to women
New Auto-Interp
Negative Logits
oste
-0.15
ova
-0.14
Picks
-0.13
Sala
-0.13
epar
-0.13
wers
-0.13
à¹Ģà¸Ĺศ
-0.13
ALLE
-0.13
nej
-0.13
atsu
-0.13
POSITIVE LOGITS
said
0.16
empo
0.16
ãĥ¼ãĤ¿ãĥ¼
0.15
SCALL
0.15
pto
0.14
bons
0.14
odium
0.14
λία
0.14
zza
0.14
ãģ«ãĤĪ
0.14
Activations Density 0.380%