INDEX
Explanations
This neuron activates on words used when asking for clarification or more information—tokens like “understand,” “better,” and “please” that signal a polite request for details.
New Auto-Interp
Negative Logits
.sys
-0.07
ngOn
-0.07
startPos
-0.06
Гол
-0.06
Если
-0.06
commentary
-0.06
_storage
-0.06
>()↵↵
-0.06
vertically
-0.06
provisional
-0.06
POSITIVE LOGITS
디자인
0.07
actividades
0.07
动
0.07
ーズ
0.07
pair
0.07
illuminate
0.07
classes
0.07
_MT
0.06
゚
0.06
_ros
0.06
Activations Density 0.014%