INDEX
Explanations
verbs expressing preferences or opinions
expressions of appreciation or positive feedback
New Auto-Interp
Negative Logits
MRI
-0.82
strate
-0.81
elaide
-0.81
owned
-0.78
cum
-0.76
udder
-0.75
herer
-0.75
perse
-0.74
arate
-0.73
strap
-0.70
POSITIVE LOGITS
idea
1.22
outcome
1.14
entirety
1.09
possibility
1.09
notion
1.07
latter
1.06
same
1.02
plight
0.99
slightest
0.99
sheer
0.98
Activations Density 0.377%