INDEX
Explanations
percentage symbols and related formatting in code
New Auto-Interp
Negative Logits
…
-0.40
Hallen
-0.40
forward
-0.38
ta
-0.37
ạ
-0.36
confirm
-0.35
dabei
-0.35
Kirkland
-0.34
,
-0.34
forward
-0.34
POSITIVE LOGITS
<?
0.97
"><?
0.96
"<?
0.94
'<?
0.91
<?
0.91
:%
0.88
${\0.87
${\0.86
><?
0.86
(%
0.85
Activations Density 0.208%