INDEX
Explanations
punctuation
This neuron detects assistant-generated text (tokens marking the assistant's responses).
New Auto-Interp
Negative Logits
坂
-0.07
lava
-0.07
stitched
-0.07
_modifier
-0.07
⻝
-0.07
哪裡
-0.07
besten
-0.07
משמעות
-0.07
spd
-0.07
Venus
-0.07
POSITIVE LOGITS
릇
0.07
nw
0.07
�
0.07
Rule
0.06
.Types
0.06
Res
0.06
Campaign
0.06
דבר
0.06
起到了
0.06
晁
0.06
Activations Density 0.125%