INDEX
Explanations
The neuron activates on word‐initial “mon(o)-” fragments (e.g. monogenic, monocular, monolithic).
New Auto-Interp
Negative Logits
eff
-0.08
!I
-0.07
ีเอ
-0.07
Vital
-0.07
Wit
-0.07
энерг
-0.07
fang
-0.07
eff
-0.07
Eff
-0.07
拔
-0.07
POSITIVE LOGITS
Mon
0.14
mon
0.13
Mon
0.13
mon
0.12
Monica
0.10
.Mon
0.09
_mon
0.08
MON
0.08
-mon
0.08
monastery
0.08
Activations Density 0.033%