INDEX
Explanations
The neuron activates on in-text citation markers (e.g. numbered or bracketed reference/footnote indicators).
New Auto-Interp
Negative Logits
medi
-0.08
gett
-0.07
ницт
-0.07
Mi
-0.07
roke
-0.07
psych
-0.07
maint
-0.07
mi
-0.07
نت
-0.06
帰
-0.06
POSITIVE LOGITS
"]==
0.07
]]
0.07
Lol
0.07
باشند
0.07
!*
0.07
Tear
0.06
ule
0.06
0.06
ตำ
0.06
Torrent
0.06
Activations Density 0.009%