INDEX
Explanations
terms related to causing harm or injury
New Auto-Interp
Negative Logits
wagen
-0.82
cius
-0.79
ortment
-0.76
runner
-0.73
shore
-0.71
mbuds
-0.69
zyme
-0.68
chrom
-0.68
Ou
-0.68
clinton
-0.67
POSITIVE LOGITS
wounds
1.10
havoc
1.06
inflicted
1.02
damage
0.99
inflict
0.97
injuries
0.87
carnage
0.86
humiliation
0.84
inflic
0.83
griev
0.83
Activations Density 0.013%