INDEX
Explanations
violent incidents involving harm to people
New Auto-Interp
Negative Logits
isSpecialOrderable
-0.83
ĨĴ
-0.81
Relax
-0.71
hedon
-0.70
Awareness
-0.70
Applic
-0.70
yi
-0.69
Aware
-0.68
QUEST
-0.67
lear
-0.67
POSITIVE LOGITS
injuring
1.56
wounding
1.44
destroying
1.43
injure
1.39
killing
1.32
wreck
1.28
robbing
1.23
murdering
1.20
cripp
1.19
inciner
1.19
Activations Density 0.244%