INDEX
Explanations
high-frequency words and pronouns typically used in discussions
had been analyzed
New Auto-Interp
Negative Logits
<eos>
-0.28
-0.28
↵
-0.27
wynosi
-0.26
ambilan
-0.25
również
-0.25
Grenze
-0.24
bluza
-0.24
↵↵
-0.23
is
-0.23
POSITIVE LOGITS
<unused41>
1.00
<unused79>
0.99
[@BOS@]
0.99
<unused43>
0.99
<unused52>
0.99
<unused28>
0.99
<unused68>
0.99
<unused74>
0.99
<unused14>
0.99
<unused23>
0.99
Activations Density 0.157%