INDEX
Explanations
This neuron activates on speaker-attribution language—especially reporting or quotation verbs like “said,” “explained,” and similar attribution cues.
New Auto-Interp
Negative Logits
fort
-0.07
困
-0.06
lpVtbl
-0.06
Manus
-0.06
Sist
-0.06
donn
-0.06
Lingu
-0.06
완
-0.06
866
-0.06
территории
-0.06
POSITIVE LOGITS
Published
0.07
leanup
0.06
Converter
0.06
.Health
0.06
JI
0.06
iness
0.06
preocup
0.06
Коли
0.06
.ct
0.06
ervation
0.06
Activations Density 0.057%