INDEX
Explanations
occurrences of the period (.) punctuation mark
New Auto-Interp
Negative Logits
wr
-0.16
-
-0.15
-0.15
,↵
-0.15
esion
-0.14
cÃŃ
-0.14
228
-0.14
eed
-0.14
âĢ
-0.13
&
-0.13
POSITIVE LOGITS
页éĿ¢åŃĺæ¡£å¤ĩ份
0.16
latter
0.15
arde
0.15
ë¿IJ
0.15
↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵↵
0.15
Lesser
0.14
.increment
0.14
zano
0.14
gether
0.14
поÑĤол
0.13
Activations Density 0.103%