INDEX
Explanations
adverbs indicating likelihood or expectation
phrases that assert expectations or recommendations
New Auto-Interp
Negative Logits
CI
-0.67
Patty
-0.62
Afgh
-0.57
Rox
-0.57
Cir
-0.56
Mehran
-0.56
yss
-0.54
Ends
-0.54
Fra
-0.54
HER
-0.54
POSITIVE LOGITS
ideally
1.15
ered
1.11
be
1.09
ering
1.04
suffice
1.04
theoretically
0.93
nt
0.93
NEVER
0.89
beware
0.87
definitely
0.86
Activations Density 0.068%