INDEX
Explanations
rhetorical questions and expressions of disbelief or skepticism
New Auto-Interp
Negative Logits
chie
-0.17
ób
-0.16
ãĥ³ãĤ¸
-0.16
ascript
-0.15
enie
-0.15
ucle
-0.15
htar
-0.14
radan
-0.14
elog
-0.14
elly
-0.14
POSITIVE LOGITS
yes
0.56
Yes
0.50
YES
0.49
Yes
0.45
yes
0.43
YES
0.40
Nope
0.35
_YES
0.33
=yes
0.33
.Yes
0.32
Activations Density 0.137%