INDEX
Explanations
questions/dialogue
The neuron responds to tokens in first‐person or quoted internal thoughts (e.g. the opening quotation mark and words like “I,” “thought,” “sure” in introspective statements).
New Auto-Interp
Negative Logits
ється
-0.06
vscode
-0.06
Vám
-0.06
ádu
-0.06
_yaw
-0.06
ADV
-0.06
realm
-0.06
cry
-0.06
lij
-0.06
.Card
-0.06
POSITIVE LOGITS
'; ↵
0.07
vag
0.07
质
0.06
質
0.06
uate
0.06
staging
0.06
correctness
0.06
hourly
0.06
Hispan
0.06
-‐
0.06
Activations Density 0.004%