INDEX
Explanations
phrases indicating a future prediction or expectation
references to future expectations or events
New Auto-Interp
Negative Logits
orthodox
-0.87
raid
-0.85
ricular
-0.84
illusion
-0.79
tops
-0.79
inguished
-0.78
icon
-0.77
elsius
-0.76
orsi
-0.76
²¾
-0.76
POSITIVE LOGITS
undone
1.11
forth
0.92
WRITE
0.84
out
0.80
knocking
0.80
ashore
0.79
up
0.76
INTO
0.74
flooding
0.74
Forth
0.74
Activations Density 0.060%