INDEX
Explanations
phrases that indicate the beginning or initiation of actions
New Auto-Interp
Negative Logits
udden
-0.18
ndx
-0.16
sian
-0.14
ertools
-0.14
.mx
-0.14
_through
-0.14
ाà¤Ĺत
-0.14
ilib
-0.14
oteric
-0.13
à¤Ŀ
-0.13
POSITIVE LOGITS
off
0.24
somewhere
0.22
small
0.22
fresh
0.21
slow
0.21
right
0.21
wherever
0.21
simple
0.20
af
0.20
sentences
0.20
Activations Density 0.069%