INDEX
Explanations
question marks and query indicators in the text
New Auto-Interp
Negative Logits
-0.90
.
-0.76
(
-0.71
↵↵
-0.66
I
-0.65
In
-0.61
</i>
-0.59
in
-0.59
,
-0.59
a
-0.58
POSITIVE LOGITS
="?
1.39
?'
1.34
$?
1.31
?...
1.30
'?'
1.28
!?
1.27
?<
1.22
Majefty
1.22
'?
1.20
?!?
1.20
Activations Density 0.143%