INDEX
Explanations
words with special characters or specific grammatical features
New Auto-Interp
Negative Logits
ãĥĸãĥª
-0.16
jÃŃ
-0.16
ê²ĥìľ¼ë¡ľ
-0.15
izard
-0.15
ansom
-0.14
rvine
-0.14
ué
-0.14
[d
-0.14
izza
-0.14
-extra
-0.14
POSITIVE LOGITS
mind
0.25
az
0.24
Mind
0.21
ann
0.18
Az
0.17
meg
0.17
mind
0.17
ez
0.17
OTH
0.17
recip
0.16
Activations Density 0.000%