INDEX
Explanations
references to caste, ethnicity, and socioeconomic status
New Auto-Interp
Negative Logits
arto
-0.17
Pron
-0.16
ibo
-0.15
zen
-0.14
eri
-0.14
ipo
-0.14
undles
-0.14
Dy
-0.14
env
-0.13
lik
-0.13
POSITIVE LOGITS
ë³Ħ
0.34
-specific
0.30
åĪ¥
0.26
pecific
0.26
_specific
0.26
specific
0.24
specific
0.24
Specific
0.23
specificity
0.21
Specific
0.21
Activations Density 0.195%