INDEX
Explanations
phrases indicating purpose or intention
New Auto-Interp
Negative Logits
æĵļ
-0.14
ữu
-0.14
746
-0.14
249
-0.14
Å¥
-0.14
coincidence
-0.13
melon
-0.13
alty
-0.13
osl
-0.13
udad
-0.13
POSITIVE LOGITS
meant
0.18
ander
0.17
intended
0.16
spir
0.15
Ø®ÙĪØ§ÙĨ
0.15
lander
0.15
é¼
0.15
arges
0.14
mund
0.14
ivid
0.14
Activations Density 0.021%