INDEX
Explanations
exclamations or questions expressing emotional responses
question marks
New Auto-Interp
Negative Logits
Diſ
-0.99
Houſe
-0.96
―――――
-0.95
purpoſe
-0.92
مرئيه
-0.90
Majefty
-0.90
Anſ
-0.88
$_"
-0.88
depositphotos
-0.88
iſt
-0.87
POSITIVE LOGITS
<bos>
2.36
the
1.05
and
0.97
'
0.88
of
0.80
for
0.78
in
0.77
is
0.73
↵↵
0.72
to
0.71
Activations Density 0.668%