INDEX
Explanations
conversational
This neuron flags user queries—tokens appearing in the user’s questions.
New Auto-Interp
Negative Logits
AIL
-0.07
Du
-0.07
URING
-0.07
기
-0.07
contar
-0.07
Serialize
-0.06
ONS
-0.06
asil
-0.06
Nikki
-0.06
vous
-0.06
POSITIVE LOGITS
tog
0.07
(shift
0.06
ังม
0.06
státy
0.06
hosts
0.06
putt
0.06
subparagraph
0.06
rubbish
0.06
(CONFIG
0.06
dez
0.06
Activations Density 0.063%