INDEX
Explanations
Citations
The neuron selectively activates on in‐text citation markers and reference labels (e.g. bracketed “[@HS…]” tokens and author‐initial tags).
New Auto-Interp
Negative Logits
comm
-0.07
_inf
-0.07
Sawyer
-0.06
insured
-0.06
donation
-0.06
rust
-0.06
affidavit
-0.06
Replacing
-0.06
Raid
-0.06
vos
-0.06
POSITIVE LOGITS
doğrult
0.07
허
0.06
Profes
0.06
dem
0.06
здійс
0.06
Değ
0.06
Vous
0.06
بم
0.06
าษฎ
0.06
Friendship
0.06
Activations Density 0.014%