INDEX
Explanations
Dialogue
This neuron activates on the special metadata/control tokens (e.g. header-ID, begin/end markers, speaker tags) that structure the conversation rather than on ordinary content words.
instructions that set covert goals for roleplay—steering the conversation subtly toward a hidden agenda or eliciting help/money without stating it directly.
New Auto-Interp
Negative Logits
!?
-0.07
vanized
-0.06
Drivers
-0.06
.opendaylight
-0.06
Similar
-0.06
文件
-0.06
Ramirez
-0.06
apesh
-0.05
Sanity
-0.05
عبدال
-0.05
POSITIVE LOGITS
Anglo
0.07
.pages
0.07
.intersection
0.07
ایش
0.06
UN
0.06
panc
0.06
.window
0.06
icans
0.06
igma
0.06
.report
0.06
Activations Density 0.028%