INDEX
Explanations
phrases related to decision-making and actions
phrases related to time-sensitive events and decisions
New Auto-Interp
Negative Logits
ãĤ©
-0.66
PLA
-0.62
verty
-0.59
rain
-0.56
ertain
-0.56
yle
-0.55
awa
-0.54
mac
-0.54
weeney
-0.52
raq
-0.52
POSITIVE LOGITS
altogether
2.30
entirely
1.71
outright
1.30
completely
1.13
lest
1.10
because
1.08
until
1.06
whatsoever
1.01
indefinitely
0.99
anymore
0.98
Activations Density 0.499%