INDEX
Explanations
code-related constructs involving conditions and comparisons
mathematical symbols or code operators
New Auto-Interp
Negative Logits
queſta
-1.10
⟬
-1.05
<unused79>
-0.97
<unused68>
-0.97
<unused43>
-0.96
<unused41>
-0.96
<unused51>
-0.96
<unused52>
-0.96
<unused8>
-0.96
<unused28>
-0.96
POSITIVE LOGITS
↵↵
0.50
hacerlo
0.50
equally
0.42
↵
0.40
equivalent
0.39
<eos>
0.39
likewise
0.38
farlo
0.38
same
0.38
both
0.37
Activations Density 0.041%