INDEX
Explanations
references to the concept of "from," indicating a focus on origins or sources
New Auto-Interp
Negative Logits
apons
-0.15
roe
-0.14
idan
-0.14
aya
-0.14
lub
-0.14
ربÙĩ
-0.14
/from
-0.14
ramer
-0.13
tero
-0.13
ê³
-0.13
POSITIVE LOGITS
scratch
0.19
دÙĪØ§Ø¬
0.18
ĥģ
0.18
scratch
0.17
ãĥ¼ãĤ¹
0.16
éo
0.16
atatype
0.16
Pers
0.15
Ã¥n
0.15
stash
0.15
Activations Density 0.129%