INDEX
Explanations
punctuation marks and specific formatting characters in text
New Auto-Interp
Negative Logits
.
-0.63
(
-0.61
↵↵
-0.56
“
-0.56
(
-0.53
&
-0.52
,
-0.52
↵
-0.51
e
-0.50
-0.49
POSITIVE LOGITS
Theſe
1.32
myſelf
1.28
Efq
1.27
himſelf
1.21
ainfi
1.18
becauſe
1.15
Jefus
1.15
ſeveral
1.15
Majefty
1.14
againſt
1.13
Activations Density 0.565%