INDEX
Explanations
references to the second person pronouns "you" and "your."
your possessions, health, actions
New Auto-Interp
Negative Logits
↵↵
-0.42
↵
-0.39
-0.37
<eos>
-0.36
the
-0.36
of
-0.36
No
-0.35
.
-0.34
res
-0.34
for
-0.34
POSITIVE LOGITS
AssemblyCulture
0.94
queſta
0.91
:✨
0.91
ſind
0.87
<unused16>
0.87
<unused8>
0.87
<unused14>
0.86
<unused17>
0.86
<unused3>
0.86
[@BOS@]
0.86
Activations Density 0.045%