INDEX
Explanations
special characters, punctuation, and symbols often used in formal or technical contexts
New Auto-Interp
Negative Logits
àª
-0.18
ÑĦ
-0.17
%D
-0.17
à±
-0.17
à¨
-0.17
ëį°ìĿ´íĬ¸
-0.17
ש
-0.16
×ķ×
-0.16
à°
-0.16
á
-0.16
POSITIVE LOGITS
×IJ
0.28
×Ķ
0.28
×ŀ
0.27
à¦
0.27
×
0.27
×ij
0.27
à
0.26
׾
0.26
ש
0.23
à®
0.22
Activations Density 0.005%