INDEX
Explanations
Code/Non-sentences
The neuron activates on temporal markers—specifically month names or abbreviations (and related season labels).
New Auto-Interp
Negative Logits
-0.07
_RUNNING
-0.07
juven
-0.06
out
-0.06
airline
-0.06
ателей
-0.06
indrome
-0.06
ology
-0.06
vote
-0.06
tenants
-0.06
POSITIVE LOGITS
scrap
0.07
变化
0.06
>>;↵
0.06
Asian
0.06
BT
0.06
__,↵
0.06
.arc
0.06
�
0.06
}↵↵↵↵
0.06
>(*
0.06
Activations Density 0.202%