INDEX
Explanations
analysis
This neuron detects summary/discourse phrases that introduce findings or conclusions, such as “From this analysis” or “From this study.”
New Auto-Interp
Negative Logits
benchmark
-0.07
lumber
-0.06
hammer
-0.06
.intersection
-0.06
_robot
-0.06
implant
-0.06
�
-0.06
ีการ
-0.06
Hakk
-0.06
陆
-0.06
POSITIVE LOGITS
сна
0.07
272
0.07
Й
0.06
český
0.06
ový
0.06
ernen
0.06
starring
0.06
holy
0.06
_Character
0.06
OPEN
0.06
Activations Density 0.031%