INDEX
Explanations
modal verbs indicating suggestion or obligation
New Auto-Interp
Negative Logits
Fra
-0.79
Wid
-0.68
WI
-0.65
Puzzle
-0.65
Hilbert
-0.62
Syndrome
-0.61
ROS
-0.59
atile
-0.59
Afgh
-0.59
Kah
-0.58
POSITIVE LOGITS
ered
1.15
be
1.06
nt
1.01
ideally
0.96
n
0.94
ering
0.94
strive
0.90
aspire
0.90
beware
0.87
ublic
0.87
Activations Density 1.112%