INDEX
Explanations
punctuation marks and symbols
Quotation marks followed by specific words
citations and quotes
New Auto-Interp
Negative Logits
E
-0.82
S
-0.81
e
-0.81
P
-0.80
K
-0.79
WriteLiteral
-0.79
C
-0.78
l
-0.77
I
-0.74
X
-0.72
POSITIVE LOGITS
myſelf
1.46
Theſe
1.34
Jefus
1.29
himſelf
1.28
ſelves
1.27
pleaſure
1.26
themſelves
1.23
uſed
1.22
whoſe
1.21
ainfi
1.20
Activations Density 1.653%