INDEX
Explanations
phrases prompting the reader to take action or pay attention to specific information
phrases that emphasize certainty and encourage action
New Auto-Interp
Negative Logits
MpServer
-0.85
TPPStreamerBot
-0.82
impl
-0.74
oub
-0.68
bled
-0.68
omal
-0.67
Imagine
-0.66
gery
-0.65
folk
-0.64
Closure
-0.63
POSITIVE LOGITS
checking
0.76
beforehand
0.70
Shogun
0.70
!:
0.69
icio
0.64
beware
0.63
clicking
0.63
appropriate
0.63
patience
0.62
Merrill
0.62
Activations Density 0.046%