INDEX
Explanations
instances of questions and affirmative responses in dialogue
New Auto-Interp
Negative Logits
.Apis
-0.17
daq
-0.15
ÏĨι
-0.14
andom
-0.14
asl
-0.14
Ì£
-0.14
리ì§Ģ
-0.14
.cloudflare
-0.13
è§
-0.13
Birds
-0.13
POSITIVE LOGITS
yes
0.25
yes
0.22
Yes
0.22
res
0.22
YES
0.21
‘
0.19
Maybe
0.19
Hell
0.19
NO
0.19
Yes
0.19
Activations Density 0.076%