INDEX
Explanations
phrases related to decision-making and approval
expressions of potential outcomes or alternatives
New Auto-Interp
Negative Logits
)</
-0.73
estern
-0.62
but
-0.57
heric
-0.56
?",
-0.55
*)
-0.55
ursday
-0.52
erent
-0.51
asus
-0.51
leted
-0.51
POSITIVE LOGITS
yet
1.13
yet
0.97
nor
0.86
anymore
0.84
anyway
0.79
either
0.77
unless
0.77
anytime
0.75
whatsoever
0.72
.
0.72
Activations Density 0.599%