INDEX
Explanations
verifying
The neuron fires on polite, procedural agent prompts/offers of help (e.g. “May I have…,” “Let me check…,” “I’m sorry…”).
New Auto-Interp
Negative Logits
venient
-0.07
-play
-0.07
Hast
-0.06
Fin
-0.06
808
-0.06
.writeInt
-0.06
padding
-0.06
.x
-0.06
会
-0.06
LOGY
-0.06
POSITIVE LOGITS
isinde
0.07
ocre
0.06
entario
0.06
enser
0.06
0.06
rå
0.06
Boyle
0.06
_finished
0.06
dir
0.06
(summary
0.06
Activations Density 0.022%