INDEX
Explanations
The neuron selectively activates on fragments of personal names—especially uncommon or foreign‐sounding surnames.
New Auto-Interp
Negative Logits
""" ↵
-0.07
verw
-0.06
mus
-0.06
(part
-0.06
*z
-0.06
obra
-0.05
उम
-0.05
hart
-0.05
Sus
-0.05
δεδο
-0.05
POSITIVE LOGITS
ropping
0.08
Οικο
0.07
Κο
0.07
öh
0.07
connected
0.07
왜
0.06
.met
0.06
Driver
0.06
отв
0.06
trinsic
0.06
Activations Density 0.076%