INDEX
Explanations
very high activation values for special characters or unusual characters in the text
New Auto-Interp
Negative Logits
ogan
-0.17
або
-0.16
fty
-0.15
loat
-0.14
á»ķi
-0.14
aines
-0.14
arges
-0.14
rana
-0.14
ÑĥÑģÑĤа
-0.14
cede
-0.14
POSITIVE LOGITS
ï¸
0.20
¦
0.17
ï¸ı
0.17
Į
0.16
olle
0.15
ĥ
0.15
象
0.14
ibox
0.14
âĶģ
0.14
idor
0.14
Activations Density 0.012%