INDEX
Explanations
phrases or sentences expressing future predictions
phrases indicating future actions or events
New Auto-Interp
Negative Logits
coni
-0.68
ihad
-0.62
lude
-0.59
cano
-0.57
furt
-0.56
illery
-0.56
uminati
-0.56
krit
-0.54
recalls
-0.54
lake
-0.53
POSITIVE LOGITS
to
1.05
nowhere
0.94
downhill
0.81
extinct
0.74
to
0.72
TO
0.71
overboard
0.70
uphill
0.68
©¶æ
0.67
viral
0.66
Activations Density 0.044%