INDEX
Explanations
characters from different languages and scripts, as well as special characters
characters or symbols used in encoding or text representation
New Auto-Interp
Negative Logits
hyde
-0.87
mathemat
-0.74
owler
-0.73
ufact
-0.71
espie
-0.70
ILCS
-0.69
esville
-0.69
abase
-0.68
ngth
-0.68
assic
-0.68
POSITIVE LOGITS
×Ļ×
1.17
IJ
1.16
ת
1.14
׾
1.08
×ķ
1.07
κ
1.00
ä¸ī
0.98
ãĤī
0.98
×
0.92
ר
0.91
Activations Density 0.008%