INDEX
Explanations
This neuron fires on personal names and other references to individual people (proper nouns).
New Auto-Interp
Negative Logits
Inspection
-0.08
Excellence
-0.08
دین
-0.07
affiliation
-0.06
Contribution
-0.06
درب
-0.06
Cou
-0.06
oğ
-0.06
싱
-0.06
Facility
-0.06
POSITIVE LOGITS
ensored
0.07
Missing
0.06
bastante
0.06
jr
0.06
jeho
0.06
ivate
0.06
ानक
0.06
ertz
0.06
write
0.06
_CRC
0.06
Activations Density 0.020%