INDEX
Explanations
instances of saying "no" or refusal in conversation
New Auto-Interp
Negative Logits
phin
-0.15
ophage
-0.14
casecmp
-0.14
ายà¸Ļ
-0.14
adı
-0.14
isFunction
-0.14
instein
-0.14
Ñīий
-0.14
еÑĤа
-0.14
rell
-0.13
POSITIVE LOGITS
NO
0.27
yes
0.26
YES
0.25
Yes
0.24
yes
0.24
YES
0.23
NO
0.20
no
0.19
Yes
0.19
_NO
0.19
Activations Density 0.052%