INDEX
Explanations
references to plants and leadership
New Auto-Interp
Negative Logits
<unused43>
-1.09
<unused41>
-1.09
<unused74>
-1.09
<unused3>
-1.08
<unused14>
-1.08
[@BOS@]
-1.08
<unused23>
-1.08
<unused42>
-1.08
<unused17>
-1.08
<pad>
-1.07
POSITIVE LOGITS
,
0.85
0.83
↵↵
0.82
↵
0.82
.
0.75
1
0.73
system
0.72
2
0.72
<eos>
0.70
(
0.70
Activations Density 0.426%