INDEX
Explanations
punctuation marks, particularly question marks and periods
New Auto-Interp
Negative Logits
"
-0.17
'.$
-0.16
ÂĿ
-0.16
'
-0.16
corners
-0.15
boil
-0.15
tal
-0.15
i
-0.15
ssize
-0.14
ľ
-0.14
POSITIVE LOGITS
iddi
0.18
”↵
0.17
ær
0.16
”↵
0.16
ulton
0.16
uyá»ĥn
0.15
elles
0.15
åĪ·
0.15
üven
0.15
Ỽ
0.15
Activations Density 0.026%