INDEX
Explanations
references to violent crimes
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.14
3:0.07
4:0.04
5:0.02
6:0.28
7:0.16
8:0.03
9:0.03
10:0.07
11:0.05
Negative Logits
etheless
-2.25
theless
-1.68
anwhile
-1.67
"$:/
-1.63
ichever
-1.52
FANTASY
-1.49
":""},{"-1.40
forth
-1.36
soDeliveryDate
-1.35
iland
-1.34
POSITIVE LOGITS
gery
1.67
ilation
1.37
phy
1.31
ission
1.30
Delivery
1.27
geries
1.27
Sacrifice
1.26
sed
1.24
iaz
1.21
Result
1.19
Activations Density 0.001%