INDEX
Explanations
negative contractions and phrases involving denial or negation
New Auto-Interp
Negative Logits
Be
-0.24
be
-0.23
(be
-0.22
Be
-0.21
are
-0.20
be
-0.18
can
-0.17
may
-0.17
.Be
-0.16
(Be
-0.16
POSITIVE LOGITS
need
0.37
deserve
0.32
seem
0.30
need
0.30
have
0.30
belong
0.28
HAVE
0.28
Need
0.27
_need
0.26
exist
0.25
Activations Density 0.226%