INDEX
Explanations
mentions of specific female names, particularly in relation to discussions of their status or actions
New Auto-Interp
Negative Logits
lee
-0.17
achel
-0.15
bir
-0.15
apses
-0.14
zer
-0.14
Marketable
-0.14
_fmt
-0.14
uchs
-0.14
aler
-0.14
ACHE
-0.14
POSITIVE LOGITS
agara
0.29
elsen
0.22
itos
0.17
elson
0.16
Ni
0.16
olson
0.16
hoff
0.16
бÑĥдÑĮ
0.16
itsu
0.15
itty
0.15
Activations Density 0.018%