INDEX
Explanations
punctuation marks, especially exclamation points and periods
New Auto-Interp
Negative Logits
ãģ¯ãģļ
-0.17
gnore
-0.15
æŃ£
-0.13
âĢĥ
-0.13
ramid
-0.13
آخرÛĮÙĨ
-0.13
Dou
-0.13
ãģĹãģı
-0.13
412
-0.13
(*(
-0.13
POSITIVE LOGITS
1
0.40
01
0.28
Û±
0.24
ï¼ij
0.24
âijł
0.20
âĦĸ
0.18
âĤģ
0.18
âĸį
0.17
anik
0.16
firstly
0.16
Activations Density 0.081%