INDEX
Explanations
phrases indicating the presence of limitations or conditions
New Auto-Interp
Head Attr Weights
0:0.10
1:0.03
2:0.04
3:0.23
4:0.02
5:0.06
6:0.04
7:0.09
8:0.05
9:0.04
10:0.16
11:0.09
Negative Logits
enhagen
-1.14
Mutual
-1.07
gently
-0.96
Strategies
-0.96
Conan
-0.96
ullivan
-0.95
)].
-0.94
Guinness
-0.94
Ake
-0.92
ensional
-0.90
POSITIVE LOGITS
whatsoever
2.56
nor
1.95
anymore
1.48
nor
1.38
slightest
1.33
anywhere
1.26
dime
1.16
EVER
1.09
except
1.09
ever
1.09
Activations Density 0.171%