INDEX
Explanations
phrases indicating certainty or doubt
expressions of certainty or assumptions regarding people's knowledge or opinions
New Auto-Interp
Negative Logits
>]
-0.73
ĻĤ
-0.72
inth
-0.64
ilaterally
-0.62
Exit
-0.62
iasco
-0.62
effectively
-0.61
sole
-0.59
supposedly
-0.59
ocre
-0.59
POSITIVE LOGITS
delighted
0.80
wouldn
0.80
benefited
0.77
abhor
0.73
disappointed
0.71
minded
0.71
thrilled
0.70
hated
0.69
horr
0.69
pissed
0.68
Activations Density 0.286%