INDEX
Explanations
references to specific nationalities and ethnic groups
New Auto-Interp
Negative Logits
lessly
-0.19
prostitutas
-0.17
eson
-0.15
edor
-0.15
lying
-0.15
ful
-0.14
ship
-0.14
berapa
-0.14
344
-0.14
tainment
-0.14
POSITIVE LOGITS
/OR
0.20
-American
0.19
-Americans
0.17
/left
0.17
who
0.16
imals
0.16
-Muslim
0.15
boro
0.15
/N
0.15
aurus
0.14
Activations Density 0.101%