INDEX
Explanations
The neuron detects short affirmative response tokens (instances of agreement/“yes”-type replies) in the text.
New Auto-Interp
Negative Logits
348
-0.07
що
-0.07
storia
-0.07
perience
-0.07
Anthrop
-0.07
cedure
-0.06
führ
-0.06
noir
-0.06
citt
-0.06
placement
-0.06
POSITIVE LOGITS
yes
0.12
YES
0.09
Yes
0.09
Yes
0.09
"Yes
0.08
sí
0.07
positive
0.07
.Yes
0.07
RS
0.07
yAxis
0.07
Activations Density 0.037%