INDEX
Explanations
references to racism and racial discrimination
New Auto-Interp
Negative Logits
ullo
-0.17
\xaa
-0.15
-bs
-0.14
chio
-0.14
orta
-0.14
ảo
-0.14
,eg
-0.14
esinde
-0.13
.echo
-0.13
olest
-0.13
POSITIVE LOGITS
disposable
0.16
foreign
0.16
nos
0.15
inferior
0.15
dusk
0.15
pathology
0.15
385
0.15
deemed
0.15
\Object
0.15
fortune
0.15
Activations Density 0.173%