INDEX
Explanations
variations of punctuation and formatting symbols
New Auto-Interp
Negative Logits
.
-0.55
,
-0.49
to
-0.44
(
-0.42
-0.42
có
-0.41
...
-0.39
so
-0.39
<eos>
-0.38
/
-0.38
POSITIVE LOGITS
myſelf
1.51
itſelf
1.44
Theſe
1.41
Majefty
1.41
Jefus
1.39
Efq
1.38
purpoſe
1.36
themſelves
1.35
pleaſure
1.34
poffible
1.34
Activations Density 0.872%