INDEX
Explanations
phrases related to planning and future actions
New Auto-Interp
Negative Logits
šov
-0.17
pv
-0.16
eniz
-0.15
utherland
-0.14
ussia
-0.14
dn
-0.14
udes
-0.14
agem
-0.14
ÅĦst
-0.14
rega
-0.14
POSITIVE LOGITS
next
0.73
next
0.60
future
0.53
_next
0.52
(next
0.51
.next
0.50
next
0.49
-next
0.48
Next
0.47
näch
0.47
Activations Density 0.390%