INDEX
Explanations
The neuron selectively activates on personal names (proper nouns referring to people).
New Auto-Interp
Negative Logits
quant
-0.06
alach
-0.06
シャ
-0.06
Pru
-0.06
арь
-0.06
LICENSE
-0.06
TR
-0.06
Rust
-0.05
nor
-0.05
裡
-0.05
POSITIVE LOGITS
働
0.07
怒
0.07
cé
0.07
understood
0.06
(other
0.06
-json
0.06
であった
0.06
ـ
0.06
whirl
0.06
Www
0.06
Activations Density 0.129%