INDEX
Explanations
punctuation marks indicating the start and end of sentences
New Auto-Interp
Negative Logits
<![
-0.15
anger
-0.15
ษ
-0.15
iddle
-0.14
uro
-0.14
رÙĪÛĮ
-0.14
ó
-0.14
mares
-0.13
å£
-0.13
ivity
-0.13
POSITIVE LOGITS
âĢ¢
0.32
âĢ¢
0.28
.âĢ¢
0.19
\$
0.18
atan
0.16
ç¾
0.16
ISA
0.15
ÑĨов
0.15
swe
0.14
|↵↵
0.14
Activations Density 0.130%