INDEX
Explanations
negative responses or denials in conversational contexts
New Auto-Interp
Negative Logits
enko
-0.16
idis
-0.15
him
-0.15
Euras
-0.15
acs
-0.14
instead
-0.14
etc
-0.14
Ok
-0.14
ÂŃing
-0.14
untranslated
-0.14
POSITIVE LOGITS
absolutely
0.26
partly
0.21
partially
0.21
actually
0.21
Absolutely
0.20
sir
0.20
Absolutely
0.19
definitely
0.18
!
0.18
entirely
0.18
Activations Density 0.160%