INDEX
Explanations
phrases indicating intention or desire
New Auto-Interp
Negative Logits
realise
-0.14
upon
-0.14
_refl
-0.14
recogn
-0.14
uxe
-0.14
Gam
-0.14
usterity
-0.14
fter
-0.13
Decide
-0.13
jk
-0.13
POSITIVE LOGITS
know
0.28
hear
0.23
knows
0.22
hearing
0.22
Know
0.21
-know
0.21
çŁ¥éģĵ
0.19
Know
0.19
оÑģÑĮ
0.19
know
0.18
Activations Density 0.150%