INDEX
Explanations
The neuron detects meta-instructions or system-level directives telling the assistant how to format or complete its answer.
New Auto-Interp
Negative Logits
burada
-0.07
_end
-0.07
biased
-0.07
phối
-0.07
uran
-0.06
achte
-0.06
.books
-0.06
vail
-0.06
assigned
-0.06
ises
-0.06
POSITIVE LOGITS
Kingdom
0.07
kab
0.07
blended
0.06
lstm
0.06
_TBL
0.06
microsoft
0.06
REM
0.06
总
0.06
(\<
0.06
chocolate
0.06
Activations Density 0.006%