INDEX
Explanations
phrases that convey expectations, desires, or conditions regarding various subjects
New Auto-Interp
Negative Logits
borg
-0.17
ãĤ¯ãĥĪ
-0.16
á»ĩn
-0.15
----------------------------------------------------------------------------↵
-0.14
yms
-0.14
ì¡
-0.14
dued
-0.14
她çļĦ
-0.14
rive
-0.13
ÑŁ
-0.13
POSITIVE LOGITS
to
0.23
να
0.17
us
0.16
you
0.16
ÙĪÙĨا
0.14
443
0.14
someone
0.14
ìĥģìĿĦ
0.14
warts
0.14
shed
0.14
Activations Density 0.167%