INDEX
Explanations
features related to significant achievements or rankings
New Auto-Interp
Negative Logits
ibil
-0.16
igsaw
-0.16
kart
-0.15
ahat
-0.14
odore
-0.14
kul
-0.14
(from
-0.13
Ober
-0.13
asar
-0.13
aghetti
-0.13
POSITIVE LOGITS
داÙĨ
0.16
å¹
0.15
yor
0.15
wi
0.15
Hanna
0.14
jenter
0.14
SED
0.14
ãİ¡
0.14
uv
0.14
리ìĸ´
0.14
Activations Density 0.263%