INDEX
Explanations
female political figures
occurrences of the word "woman" in various contexts
New Auto-Interp
Negative Logits
asio
-0.78
ollow
-0.71
UFF
-0.66
omer
-0.64
decay
-0.63
vez
-0.63
ollo
-0.62
aze
-0.61
akespe
-0.61
ral
-0.60
POSITIVE LOGITS
woman
1.24
士
0.87
women
0.85
uscript
0.81
Woman
0.80
puter
0.80
hood
0.78
Gaga
0.76
gdala
0.76
theless
0.76
Activations Density 0.009%