INDEX
Explanations
phrases expressing negative sentiment or doubt
New Auto-Interp
Negative Logits
is
-0.61
i
-0.60
a
-0.59
s
-0.56
tels
-0.56
k
-0.56
y
-0.54
ist
-0.53
b
-0.53
l
-0.53
POSITIVE LOGITS
wouldn
1.82
wouldn
1.69
Wouldn
1.51
Wouldn
1.35
wouldnt
1.32
shouldn
1.16
Shouldn
1.14
unlikely
1.11
GenerationType
1.07
CloseOperation
1.03
Activations Density 0.060%