INDEX
Explanations
phrases indicating a consistent or habitual action
New Auto-Interp
Negative Logits
often
-0.26
altogether
-0.23
often
-0.22
artık
-0.21
oft
-0.20
hardly
-0.19
seldom
-0.19
å¹¶ä¸į
-0.19
neither
-0.19
actually
-0.19
POSITIVE LOGITS
been
0.22
cky
0.21
-on
0.19
-available
0.19
-ending
0.18
-present
0.17
以æĿ¥
0.17
Been
0.17
wondered
0.17
ready
0.17
Activations Density 0.093%