INDEX
Explanations
instructions
The neuron fires on tokens that introduce notes, cautions, or instructions (e.g. “Note,” “Ensure,” “Please,” “Although,” “While,” etc.), highlighting editorial or directive cues.
New Auto-Interp
Negative Logits
(vs
-0.08
(le
-0.07
mind
-0.07
vy
-0.07
CSI
-0.06
appropriate
-0.06
(skip
-0.06
bitcoins
-0.06
(it
-0.06
fly
-0.06
POSITIVE LOGITS
ciudad
0.07
há
0.07
blackColor
0.06
/inc
0.06
-scripts
0.06
../../
0.06
콘
0.06
illos
0.06
襲
0.06
Muj
0.06
Activations Density 0.070%