INDEX
Explanations
punctuation marks, particularly periods and question marks
New Auto-Interp
Negative Logits
hd
-0.06
ADATA
-0.06
åζ
-0.06
ml
-0.06
yt
-0.06
Michaels
-0.06
insky
-0.06
vard
-0.06
haft
-0.06
berry
-0.05
POSITIVE LOGITS
plural
0.08
plural
0.07
_SUITE
0.07
sehen
0.07
pson
0.06
aty
0.06
же
0.06
Yeni
0.06
ué
0.06
_patch
0.06
Activations Density 0.136%