INDEX
Explanations
expressions of moral and ethical condemnation related to conflict and suffering
New Auto-Interp
Negative Logits
ãĥģãĥ£
-0.15
phái
-0.15
cus
-0.15
ermen
-0.14
Roose
-0.14
pak
-0.14
Cli
-0.14
handshake
-0.14
foo
-0.13
ORK
-0.13
POSITIVE LOGITS
dispos
0.18
sett
0.16
gaard
0.16
apartheid
0.16
annes
0.15
Gros
0.15
Pall
0.15
ey
0.14
jspx
0.14
ĭ
0.14
Activations Density 0.024%