INDEX
Explanations
phrases related to predictions or future outcomes
phrases indicating certainty or predictions about future events
New Auto-Interp
Negative Logits
onian
-0.66
Franch
-0.65
usc
-0.61
riott
-0.61
Express
-0.60
atures
-0.60
76561
-0.59
osal
-0.59
NOW
-0.59
Register
-0.58
POSITIVE LOGITS
remembered
1.03
judged
0.93
harder
0.91
sorely
0.91
eaten
0.90
difficult
0.89
phased
0.88
tougher
0.88
punished
0.86
easier
0.85
Activations Density 0.173%