INDEX
Explanations
sequences of special characters and symbols resembling non-standard text or emoticons
New Auto-Interp
Negative Logits
олоÑĤ
-0.17
åĩºçīĪ社
-0.14
anio
-0.14
964
-0.14
ắt
-0.14
ÃĹ↵↵
-0.14
ibt
-0.13
edList
-0.13
sing
-0.13
á»ķi
-0.13
POSITIVE LOGITS
proverb
0.16
ift
0.14
Äł
0.14
olf
0.14
çĵľ
0.13
ï¸
0.13
ë¡ľìļ´
0.13
æ°¸ä¹ħ
0.13
Cha
0.13
warmth
0.13
Activations Density 0.023%