INDEX
Explanations
references to the concept of "man" or masculine figures
New Auto-Interp
Negative Logits
']")
-0.95
^(@)
-0.95
للمعارف
-0.92
$.
-0.88
","","
-0.87
BibitemShut
-0.87
"]);
-0.86
"])
-0.85
piac
-0.84
']").
-0.84
POSITIVE LOGITS
man
1.76
Man
1.72
Man
1.59
MAN
1.51
man
1.50
woman
1.31
MAN
1.27
mans
1.20
Woman
1.17
men
1.17
Activations Density 0.081%