INDEX
Explanations
instances of a specific word across multiple languages
New Auto-Interp
Negative Logits
oud
-0.17
ones
-0.17
ONES
-0.16
gmt
-0.15
çĦ¶
-0.15
>Main
-0.14
oods
-0.14
lad
-0.14
ibly
-0.14
ing
-0.14
POSITIVE LOGITS
alic
0.29
ordin
0.27
ordinator
0.25
oper
0.23
оÑĢдин
0.23
OPER
0.21
ordinate
0.21
operator
0.21
наÑĩ
0.19
opers
0.19
Activations Density 0.009%