INDEX
Explanations
The neuron consistently activates on the token “trunk” (including its forms like “trunked”) regardless of context.
New Auto-Interp
Negative Logits
distributors
-0.07
atings
-0.07
_syntax
-0.07
Screens
-0.06
arenas
-0.06
Eval
-0.06
},"
-0.06
Beat
-0.06
cách
-0.06
Nay
-0.06
POSITIVE LOGITS
trunk
0.14
/trunk
0.08
torso
0.08
plank
0.07
usk
0.07
ken
0.07
Root
0.07
recom
0.07
unde
0.07
üzerinden
0.07
Activations Density 0.002%