INDEX
Explanations
punctuations, particularly periods at the end of sentences
New Auto-Interp
Negative Logits
oku
-0.71
��
-0.69
overboard
-0.69
estamp
-0.64
iances
-0.64
atories
-0.64
oma
-0.63
nightly
-0.62
gling
-0.62
icus
-0.62
POSITIVE LOGITS
田
0.79
Benz
0.77
Emer
0.76
ospons
0.72
Reloaded
0.72
TYPE
0.71
Fill
0.71
cumbers
0.70
escal
0.69
JD
0.69
Activations Density 0.015%