INDEX
Explanations
words related to acts of violence or killing
references to animal and human slaughter
New Auto-Interp
Negative Logits
Princ
-0.65
DCS
-0.64
ioxide
-0.63
Crom
-0.63
Ronaldo
-0.61
charism
-0.59
Eucl
-0.59
annis
-0.59
aucas
-0.58
Richards
-0.57
POSITIVE LOGITS
Slaughter
1.19
houses
1.16
slaughter
1.00
house
0.97
ãĥ¼ãĤ¯
0.94
\\\\\\\\
0.93
quished
0.91
ificial
0.89
slaughtered
0.84
eful
0.81
Activations Density 0.012%