INDEX
Explanations
non-English characters or symbols in the text
New Auto-Interp
Negative Logits
¶ģ
-0.16
¦¬
-0.15
viron
-0.15
ếu
-0.15
s
-0.14
/fw
-0.14
aset
-0.13
obook
-0.13
¯¼
-0.13
Rich
-0.13
POSITIVE LOGITS
Ģ
0.25
Ĵ
0.21
ģ
0.20
Ħ
0.19
ı
0.18
ĥ
0.17
IJ
0.17
İ
0.16
ĵ
0.16
Ŀ
0.16
Activations Density 0.011%