INDEX
Explanations
The neuron strongly activates on capitalized function words that kick off new sentences (e.g. “The,” “In,” “For,” “It,” “One,” “Thus”), i.e. tokens at the start of sentences.
New Auto-Interp
Negative Logits
Ав
-0.08
VN
-0.07
PY
-0.07
?).
-0.06
undead
-0.06
bitterness
-0.06
Az
-0.06
女性
-0.06
vend
-0.06
Mini
-0.06
POSITIVE LOGITS
EntryPoint
0.07
Adjustment
0.06
_compiler
0.06
.getChild
0.06
presenting
0.06
;",
0.06
-directed
0.06
}/#{0.06
_clock
0.06
.boost
0.06
Activations Density 0.068%