INDEX
Explanations
Citations and references
This neuron detects occurrences of the explicit “Yes” or “No” answer at the start of a fact‐consistency response.
New Auto-Interp
Negative Logits
“But
-0.08
"But
-0.07
Strings
-0.06
ora
-0.06
inet
-0.06
ablytyped
-0.06
“So
-0.06
Seb
-0.06
елю
-0.06
This
-0.06
POSITIVE LOGITS
vois
0.07
.templates
0.06
�
0.06
-zone
0.06
/md
0.06
evidenced
0.06
Kı
0.06
solidarity
0.06
]] ↵
0.06
(jQuery
0.06
Activations Density 0.011%