INDEX
Explanations
trigger words indicating a sequential event or action
instances of the word "Following."
New Auto-Interp
Negative Logits
imm
-0.76
adle
-0.72
immer
-0.70
inese
-0.67
agin
-0.66
access
-0.66
cci
-0.65
eri
-0.65
vere
-0.63
elo
-0.63
POSITIVE LOGITS
noon
0.81
Īè
0.73
follows
0.71
SourceFile
0.71
teen
0.69
Following
0.68
Following
0.65
Sym
0.65
Steps
0.65
>:
0.64
Activations Density 0.012%