INDEX
Explanations
quantifiers and adjectives that express intensity or degree
New Auto-Interp
Head Attr Weights
0:0.17
1:0.10
2:0.04
3:0.11
4:0.05
5:0.11
6:0.06
7:0.03
8:0.14
9:0.04
10:0.05
11:0.05
Negative Logits
argument
-1.71
Synopsis
-1.67
HI
-1.67
append
-1.52
[…]
-1.52
initial
-1.50
Abstract
-1.45
Description
-1.44
article
-1.44
Board
-1.40
POSITIVE LOGITS
sers
1.69
thumbnails
1.68
eous
1.63
etheless
1.58
-$
1.57
eful
1.56
velt
1.52
Malley
1.51
rican
1.51
retty
1.48
Activations Density 0.001%