INDEX
Explanations
This neuron fires on metadata labels (e.g. “author,” “bibliography,” etc.) in academic‐paper headers.
New Auto-Interp
Negative Logits
patri
-0.07
azi
-0.07
.Horizontal
-0.07
mountains
-0.07
harmless
-0.07
denn
-0.06
(pl
-0.06
esidir
-0.06
/md
-0.06
mary
-0.06
POSITIVE LOGITS
acesso
0.07
Flip
0.07
ENSIONS
0.07
GDPR
0.06
PRIMARY
0.06
CSV
0.06
ble
0.06
έντρο
0.06
페이지
0.06
Signature
0.06
Activations Density 0.004%