INDEX
Explanations
specific types of content in various languages
New Auto-Interp
Negative Logits
ãĤ©
-0.19
obo
-0.18
issa
-0.17
å¸ģ
-0.17
ovi
-0.17
igh
-0.16
ìĿĦ
-0.16
anda
-0.16
imes
-0.16
opi
-0.16
POSITIVE LOGITS
ng
0.20
lation
0.19
ngen
0.19
ght
0.19
zed
0.18
erten
0.18
ÅĽmy
0.18
اÙģØªÙĩ
0.18
gn
0.17
rical
0.17
Activations Density 0.113%