INDEX
Explanations
questions and responses
This neuron detects text produced by the assistant (assistant-role turns / assistant's replies and self-referential or corrective utterances).
New Auto-Interp
Negative Logits
ܢ
-0.07
ﻢ
-0.07
Viewport
-0.07
.models
-0.07
روم
-0.07
Woo
-0.06
ء
-0.06
חלב
-0.06
Pou
-0.06
е
-0.06
POSITIVE LOGITS
oklyn
0.08
uição
0.07
Mom
0.07
)dealloc
0.07
Aboriginal
0.07
笼罩
0.07
auf
0.07
AtIndex
0.07
decisive
0.07
räg
0.07
Activations Density 0.074%