INDEX
Explanations
phrases indicating uncertainty or possibility
instances of uncertainty or tentative statements
New Auto-Interp
Negative Logits
oun
-0.75
wagon
-0.72
mosqu
-0.72
rall
-0.71
compr
-0.71
advoc
-0.70
bil
-0.67
perty
-0.67
onga
-0.66
withd
-0.65
POSITIVE LOGITS
Absolutely
1.03
Especially
0.98
But
0.97
Because
0.96
org
0.94
Anyway
0.94
Unless
0.94
Secondly
0.92
And
0.92
That
0.91
Activations Density 0.458%