INDEX
Explanations
a specific pattern of repeated symbols and formatting indicators in the text
New Auto-Interp
Negative Logits
gridx
-0.71
Rond
-0.69
Gren
-0.67
Fron
-0.67
Fron
-0.67
EndContext
-0.66
zeera
-0.65
)");
-0.64
dign
-0.63
consultato
-0.63
POSITIVE LOGITS
*
1.64
!*
1.51
:*
1.45
?*
1.42
>*
1.41
-*
1.41
()*
1.39
$*$
1.36
.*
1.34
$*
1.34
Activations Density 0.873%