INDEX
Explanations
proper nouns related to individuals and their roles or professions
New Auto-Interp
Negative Logits
aju
-0.16
ông
-0.15
indo
-0.14
.sb
-0.14
бÑĥд
-0.14
oft
-0.13
Contained
-0.13
SPDX
-0.13
loses
-0.13
aren
-0.13
POSITIVE LOGITS
is
0.28
adalah
0.25
æĺ¯
0.24
æĺ¯
0.24
æĺ¯ä¸Ģ
0.23
æĺ¯æĪij
0.23
isa
0.22
is
0.21
"is
0.20
ãģ¯
0.20
Activations Density 0.073%