INDEX
Explanations
phrases or sentences indicating a sequence of actions
instances of the word "then," indicating sequential or instructional contexts
New Auto-Interp
Negative Logits
ut
-0.66
pires
-0.65
orum
-0.64
chio
-0.63
cap
-0.63
ction
-0.61
borgh
-0.61
scription
-0.60
caps
-0.60
hered
-0.59
POSITIVE LOGITS
srf
0.82
proceeded
0.81
Ń·
0.78
veland
0.77
Ͻ
0.77
EStream
0.75
proceed
0.69
Extras
0.68
————————————————
0.68
igslist
0.68
Activations Density 0.045%