INDEX
Explanations
words related to specific groups of people or nationalities
references to various ethnic or national groups
New Auto-Interp
Negative Logits
ABLE
-0.71
ASED
-0.69
increments
-0.69
ORY
-0.65
abus
-0.64
PDATE
-0.64
ANC
-0.63
ISON
-0.63
stre
-0.62
judgment
-0.62
POSITIVE LOGITS
aurus
1.37
ervative
1.08
ervatives
1.03
hare
1.03
hip
1.03
hips
1.02
paces
1.01
Anonymous
0.94
kaya
0.93
cale
0.92
Activations Density 0.128%