INDEX
Explanations
code snippets
The neuron activates on placeholder name tokens (e.g., “NAME_#”), i.e. it detects instances of anonymized person-name markers.
New Auto-Interp
Negative Logits
wl
-0.07
veil
-0.06
llib
-0.06
预
-0.06
Dış
-0.06
ซ
-0.06
dado
-0.06
grads
-0.06
_house
-0.06
ніби
-0.06
POSITIVE LOGITS
coroutine
0.07
ANG
0.07
іка
0.06
391
0.06
zvyš
0.06
Neil
0.06
HIS
0.06
ráp
0.06
큼
0.06
سام
0.06
Activations Density 1.175%