INDEX
Explanations
This neuron fires on capitalized names and titles (proper nouns), especially book and series titles.
New Auto-Interp
Negative Logits
KP
-0.07
} ↵ ↵ ↵ ↵
-0.07
anth
-0.06
INTERN
-0.06
TIME
-0.06
edula
-0.06
alert
-0.06
龍
-0.06
ADR
-0.06
표
-0.06
POSITIVE LOGITS
eslint
0.07
_EV
0.07
Albums
0.06
-induced
0.06
subtree
0.06
(quantity
0.06
프로그램
0.06
,data
0.06
register
0.06
ическая
0.06
Activations Density 0.025%