INDEX
Explanations
affirmative responses and expressions of capability or permission
New Auto-Interp
Negative Logits
ãĥ¼ãĥ«
-0.16
oad
-0.16
á»ı
-0.14
íĥ
-0.14
[out
-0.14
mys
-0.14
áci
-0.14
andom
-0.14
ichert
-0.14
ici
-0.14
POSITIVE LOGITS
YES
0.25
yes
0.24
indeed
0.24
yes
0.23
inde
0.22
Yes
0.20
Yes
0.18
YES
0.18
Indeed
0.17
=yes
0.17
Activations Density 0.099%