INDEX
Explanations
phrases indicating a proposed action or suggestion
phrases that express obligation or recommendations
New Auto-Interp
Negative Logits
Fra
-0.66
atile
-0.64
deteriorated
-0.63
CLR
-0.62
Glob
-0.62
adolesc
-0.60
ROS
-0.59
Kah
-0.57
elusive
-0.57
Puzzle
-0.56
POSITIVE LOGITS
beware
1.01
be
0.97
nt
0.94
ered
0.93
strive
0.92
erest
0.91
n
0.86
aspire
0.84
've
0.83
consider
0.83
Activations Density 0.068%