INDEX
Explanations
sentence beginnings indicating question, calculation, or assumption
Questions and instructions
New Auto-Interp
Negative Logits
<bos>
-1.62
itſelf
-1.23
myſelf
-1.09
ſelf
-1.08
Jefus
-1.00
Efq
-1.00
himſelf
-0.94
themſelves
-0.91
pleaſure
-0.91
enfans
-0.90
POSITIVE LOGITS
?
0.59
,
0.57
(
0.56
dar
0.56
=
0.56
;
0.56
:
0.56
)
0.54
un
0.53
(
0.52
Activations Density 3.755%