INDEX
Explanations
references to ethnicity and demographic groups in specific locations
New Auto-Interp
Negative Logits
uta
-0.07
bnb
-0.07
oltip
-0.07
nett
-0.07
bons
-0.07
osa
-0.07
fty
-0.07
illac
-0.07
Gram
-0.07
å§ij
-0.07
POSITIVE LOGITS
Thornton
0.07
em
0.06
ãi
0.06
Salem
0.06
Mes
0.06
reflective
0.06
backgrounds
0.06
generic
0.06
overst
0.06
Clair
0.06
Activations Density 0.017%