INDEX
Explanations
descriptive adjectives
The neuron is triggered by the placeholder token “NAME_1,” i.e. it spots that specific NAME_# placeholder whenever it appears.
New Auto-Interp
Negative Logits
-tra
-0.07
业
-0.07
_joint
-0.07
Hast
-0.07
Equ
-0.07
_BT
-0.06
answer
-0.06
廣
-0.06
TP
-0.06
Hu
-0.06
POSITIVE LOGITS
?action
0.07
ження
0.06
>');↵↵
0.06
amort
0.06
політи
0.06
teleport
0.06
(orig
0.06
ilişk
0.06
ucking
0.06
drink
0.06
Activations Density 0.076%