INDEX
Explanations
words related to oppression and injustices faced by various groups
New Auto-Interp
Negative Logits
Animations
-0.17
otel
-0.15
opposition
-0.15
_PROXY
-0.15
MOTE
-0.15
animate
-0.14
Ð¡Ðł
-0.14
oud
-0.14
Anim
-0.14
ÐĴÐIJ
-0.14
POSITIVE LOGITS
cha
0.17
oster
0.16
dub
0.15
NCY
0.15
inar
0.14
chan
0.14
riet
0.14
ÏĥÏĦά
0.14
ischer
0.14
BI
0.13
Activations Density 0.007%