INDEX
Explanations
affirmations or confirmations in the text
New Auto-Interp
Negative Logits
aldi
-0.19
_LSB
-0.18
ä¸įäºĨ
-0.17
ÑĨионнÑĭй
-0.15
оÑı
-0.15
doesn
-0.14
à¹Ħม
-0.14
nesc
-0.14
niet
-0.14
नह
-0.14
POSITIVE LOGITS
yes
0.36
yes
0.35
indeed
0.34
Yes
0.28
Yes
0.26
Yep
0.26
Yep
0.26
Indeed
0.24
Indeed
0.23
=yes
0.23
Activations Density 0.066%