INDEX
Explanations
instances of the word "text" in various contexts
New Auto-Interp
Negative Logits
in
-0.35
,
-0.34
–
-0.33
↵↵
-0.33
»,
-0.32
->
-0.32
<eos>
-0.31
>
-0.31
'@/
-0.30
\"></
-0.30
POSITIVE LOGITS
<unused74>
0.94
[@BOS@]
0.94
<unused41>
0.94
<unused79>
0.94
<unused8>
0.94
<pad>
0.94
<unused14>
0.94
<unused43>
0.94
<unused23>
0.94
<unused51>
0.94
Activations Density 0.001%