INDEX
Explanations
proper nouns, particularly names
New Auto-Interp
Negative Logits
yiy
-0.15
Král
-0.15
ابÙĦ
-0.15
رÛĮ
-0.15
ähr
-0.14
avig
-0.14
ãĥ³ãĤ¯
-0.14
.safe
-0.14
ä¸ĺ
-0.14
기íĥĢ
-0.14
POSITIVE LOGITS
urette
0.17
ature
0.16
ollen
0.15
warf
0.15
transfer
0.15
Transfer
0.14
ac
0.14
-w
0.14
Lock
0.14
Transfer
0.14
Activations Density 0.045%