INDEX
Explanations
unusual characters and symbols, potentially related to a specific language or writing system
non-standard characters or symbols
New Auto-Interp
Negative Logits
oleon
-0.79
theless
-0.74
wagen
-0.66
ierrez
-0.66
ktop
-0.66
charm
-0.64
APS
-0.63
concede
-0.63
enegger
-0.62
ãĥ¼ãĥĨãĤ£
-0.62
POSITIVE LOGITS
¹
1.30
ª
1.16
º
1.14
©¶æ¥µ
1.12
¨
1.12
±
1.12
¢
1.10
²
1.09
Ń
1.07
¥
1.02
Activations Density 0.026%