INDEX
Explanations
This neuron fires on speaker‐ or character‐identifier tokens (e.g., “NAME_1”, “NAME_5”, etc.).
New Auto-Interp
Negative Logits
pedal
-0.07
Province
-0.07
distortion
-0.07
--; ↵
-0.07
��
-0.07
outline
-0.06
.creator
-0.06
eč
-0.06
<|python_tag|>
-0.06
position
-0.06
POSITIVE LOGITS
EZ
0.07
_removed
0.07
_INCREMENT
0.06
dinosaur
0.06
없었
0.06
místě
0.06
return
0.06
Emm
0.06
Zoe
0.06
.stereotype
0.06
Activations Density 0.011%