INDEX
Explanations
phrases indicating improbability or unlikelihood
expressions of improbability or skepticism regarding future events
New Auto-Interp
Negative Logits
ravings
-0.88
ocked
-0.82
artney
-0.82
zeb
-0.80
ulative
-0.79
ixed
-0.78
atered
-0.77
bara
-0.74
insula
-0.73
aeper
-0.72
POSITIVE LOGITS
unlikely
0.92
unanim
0.83
unanimous
0.82
bably
0.82
theless
0.81
icably
0.80
improbable
0.77
fortun
0.75
impeachment
0.74
infall
0.74
Activations Density 0.007%