INDEX
Explanations
elements related to logical operations
New Auto-Interp
Negative Logits
s
-0.48
Ùĩ
-0.23
ska
-0.20
ister
-0.17
sik
-0.17
न
-0.17
sah
-0.17
sian
-0.16
p
-0.16
sie
-0.16
POSITIVE LOGITS
à¹ĥà¸Ī
0.18
æĢ§çļĦ
0.18
aris
0.14
о
0.14
consc
0.14
atre
0.14
urma
0.14
Vien
0.14
ORY
0.14
""".
0.13
Activations Density 0.162%