INDEX
Explanations
The neuron activates on proper nouns—names of people, brands, models, etc.
New Auto-Interp
Negative Logits
Viktor
-0.07
шкі
-0.07
き
-0.06
integral
-0.06
weniger
-0.06
architect
-0.06
RTS
-0.06
ріш
-0.06
�
-0.06
kry
-0.06
POSITIVE LOGITS
Amateur
0.06
(op
0.06
vrouw
0.06
gdzie
0.06
druhé
0.06
.choice
0.06
_COOKIE
0.06
hub
0.06
.bs
0.06
,[
0.06
Activations Density 0.387%