INDEX
Explanations
references to pronouns and demonstrating connections between the characters and their actions
pronouns followed by prepositions
New Auto-Interp
Negative Logits
when
-0.42
with
-0.42
using
-0.42
at
-0.42
whose
-0.39
a
-0.39
of
-0.38
the
-0.38
in
-0.36
by
-0.36
POSITIVE LOGITS
majánló
0.99
queſta
0.98
<unused3>
0.98
<unused79>
0.97
<unused16>
0.97
<unused8>
0.97
<unused17>
0.97
<unused23>
0.97
<unused14>
0.97
[@BOS@]
0.97
Activations Density 0.040%