INDEX
Explanations
common articles/pronouns
This neuron activates on the word “In” when it begins a new sentence or paragraph, marking sentence‐initial discourse transitions.
New Auto-Interp
Negative Logits
Sharing
-0.07
Syn
-0.07
relax
-0.06
eding
-0.06
Play
-0.06
sd
-0.06
asting
-0.06
n
-0.06
Playable
-0.06
Spot
-0.06
POSITIVE LOGITS
hait
0.07
む
0.07
(lambda
0.07
not
0.07
乌
0.06
;;↵
0.06
Fortunately
0.06
puede
0.06
Translatef
0.06
pla
0.06
Activations Density 0.219%