INDEX
Explanations
verbs indicating influence or causation
New Auto-Interp
Negative Logits
Yourself
-0.22
yourselves
-0.18
svůj
-0.16
yourself
-0.16
dued
-0.16
imers
-0.15
pid
-0.15
à¸Ĭม
-0.14
ัà¸ļม
-0.14
odata
-0.14
POSITIVE LOGITS
us
0.28
them
0.20
him
0.18
ä¸įäºĨ
0.17
oire
0.15
-enable
0.15
me
0.15
you
0.15
.bundle
0.15
itself
0.14
Activations Density 0.519%