INDEX
Explanations
references to entities, their characteristics, and related numerical data or results
New Auto-Interp
Negative Logits
,
-0.52
↵
-0.35
(
-0.30
"
-0.30
<eos>
-0.29
2
-0.27
1
-0.25
left
-0.25
)
-0.25
running
-0.25
POSITIVE LOGITS
ſſung
1.02
rungsseite
1.02
niſſe
1.01
<unused43>
1.01
<unused79>
1.01
<unused28>
1.00
<unused41>
1.00
<unused14>
1.00
<pad>
1.00
<unused8>
1.00
Activations Density 1.324%