INDEX
Explanations
The neuron activates on tokens forming the placeholder user name (patterns like “NAME_1,” “NAME_2,” etc.).
New Auto-Interp
Negative Logits
ICY
-0.07
Moz
-0.06
ena
-0.06
detained
-0.06
>`;↵
-0.06
Hoàng
-0.06
ontology
-0.06
Pages
-0.06
aj
-0.06
ざ
-0.06
POSITIVE LOGITS
(actual
0.07
_SEGMENT
0.07
POINT
0.06
инвести
0.06
wager
0.06
SB
0.06
bets
0.06
cleanup
0.06
observable
0.06
Additional
0.06
Activations Density 0.020%