INDEX
Explanations
references to guidance and leading influence in narratives
New Auto-Interp
Negative Logits
elman
-0.17
going
-0.16
à¹Įà¹Ģà¸ŀ
-0.15
for
-0.15
tails
-0.15
_scalar
-0.15
ระà¸Ķ
-0.14
Ac
-0.14
dik
-0.14
ź
-0.14
POSITIVE LOGITS
into
0.31
toward
0.27
towards
0.25
away
0.25
astr
0.24
Into
0.24
into
0.22
Away
0.21
INTO
0.20
onto
0.19
Activations Density 0.068%