INDEX
Explanations
phrases indicating possibility or likelihood
phrases indicating uncertainty or possibility
New Auto-Interp
Negative Logits
athing
-0.76
âĵĺ
-0.73
lier
-0.68
let
-0.67
icial
-0.66
ATS
-0.66
ICO
-0.63
LET
-0.61
managed
-0.61
lets
-0.61
POSITIVE LOGITS
seem
1.34
sound
1.12
imply
1.00
sounds
0.99
disappoint
0.95
appear
0.94
be
0.94
horr
0.93
offend
0.93
blush
0.92
Activations Density 0.093%