INDEX
Explanations
punctuation marks, specifically parentheses and periods
New Auto-Interp
Negative Logits
ings
-0.15
lett
-0.15
ons
-0.14
-ÑĤо
-0.14
-↵↵
-0.13
respective
-0.13
reader
-0.13
xima
-0.12
aille
-0.12
аÐ
-0.12
POSITIVE LOGITS
s
0.36
Ùĩ
0.24
y
0.19
à¸Ħ
0.19
sian
0.18
samp
0.18
i
0.18
ième
0.18
sak
0.17
ÛĮ
0.16
Activations Density 0.223%