INDEX
Explanations
words related to dehumanization and its effects
New Auto-Interp
Negative Logits
Elli
-0.16
olding
-0.15
Typ
-0.15
deck
-0.15
Amph
-0.15
gend
-0.15
comp
-0.15
ruz
-0.14
Ele
-0.14
StreamReader
-0.14
POSITIVE LOGITS
human
0.20
omon
0.18
facto
0.18
rig
0.17
.construct
0.17
value
0.17
grading
0.16
construct
0.16
icide
0.16
ä¼ĺ
0.16
Activations Density 0.019%