INDEX
Explanations
references to specific national or ethnic identities
New Auto-Interp
Negative Logits
lessly
-0.23
©
-0.18
ful
-0.17
prostitutas
-0.15
riers
-0.15
lying
-0.15
ship
-0.14
eson
-0.14
acles
-0.14
rier
-0.14
POSITIVE LOGITS
/OR
0.20
-Americans
0.20
-American
0.18
boro
0.18
-Muslim
0.16
/N
0.16
living
0.16
/Linux
0.16
ians
0.15
/left
0.15
Activations Density 0.098%