INDEX
Explanations
phrases related to racism and discrimination
New Auto-Interp
Negative Logits
illo
-0.16
asaki
-0.15
illac
-0.14
å°
-0.14
asshole
-0.14
aday
-0.14
ThanOrEqualTo
-0.14
uco
-0.14
psilon
-0.14
NSS
-0.13
POSITIVE LOGITS
foreign
0.18
tar
0.16
ãĤĨ
0.16
perceived
0.15
agy
0.15
Foreign
0.15
htm
0.15
technology
0.14
bog
0.14
ipy
0.14
Activations Density 0.140%