INDEX
Explanations
references to destruction or violence
New Auto-Interp
Head Attr Weights
0:0.03
1:0.03
2:0.13
3:0.06
4:0.14
5:0.03
6:0.13
7:0.16
8:0.04
9:0.04
10:0.07
11:0.07
Negative Logits
yip
-1.48
00200000
-1.38
aisle
-1.30
soDeliveryDate
-1.24
Owners
-1.22
apolis
-1.21
traveller
-1.20
favoured
-1.19
aceae
-1.13
favourable
-1.13
POSITIVE LOGITS
slic
1.54
redd
1.40
Seg
1.35
lesh
1.34
cloth
1.34
gross
1.26
laughter
1.25
Paper
1.24
futures
1.21
mson
1.20
Activations Density 0.000%