INDEX
Explanations
phrases indicating exclusivity
New Auto-Interp
Head Attr Weights
0:0.09
1:0.09
2:0.09
3:0.08
4:0.08
5:0.08
6:0.09
7:0.07
8:0.08
9:0.07
10:0.07
11:0.07
Negative Logits
Flavoring
-3.00
ipeg
-2.96
Velvet
-2.87
Premium
-2.76
Fine
-2.73
¥
-2.71
Ont
-2.70
Card
-2.64
MpServer
-2.49
Ended
-2.49
POSITIVE LOGITS
hazards
2.90
hypoc
2.61
destructive
2.60
HRC
2.59
GE
2.56
complexes
2.55
dece
2.51
hazard
2.48
py
2.46
deceptive
2.42
Activations Density 0.000%