INDEX
Explanations
references to different nationalities or ethnic groups
New Auto-Interp
Negative Logits
Variables
-0.14
Sesso
-0.14
ä¹¾
-0.14
_prim
-0.14
eÅŁ
-0.13
phy
-0.13
äºĭæ¥Ń
-0.13
áºŃt
-0.13
uzey
-0.13
iper
-0.13
POSITIVE LOGITS
apanese
0.18
Japanese
0.16
Americans
0.15
sonian
0.15
Spanish
0.15
Russians
0.15
essian
0.15
Mos
0.15
English
0.15
vat
0.14
Activations Density 0.112%