INDEX
Explanations
phrases indicating impossibility or denial
negative sentiments or expressions of inability
New Auto-Interp
Negative Logits
rather
-0.78
probably
-0.73
might
-0.72
staking
-0.66
arently
-0.66
Might
-0.65
maybe
-0.65
clarify
-0.63
not
-0.63
prototype
-0.63
POSITIVE LOGITS
able
1.27
tolerated
1.20
anymore
1.03
bothered
1.00
anywhere
0.97
allowed
0.96
permitted
0.93
construed
0.93
dissu
0.93
swayed
0.91
Activations Density 0.147%