INDEX
Explanations
The neuron fires on document-level framing or discourse markers—words that introduce or highlight the authors’ aims, purpose, or key points (e.g. “however,” “purpose,” “note,” “highlight,” “aim,” “significance”).
New Auto-Interp
Negative Logits
shows
-0.08
RT
-0.07
EACH
-0.06
SZ
-0.06
show
-0.06
Recording
-0.06
refine
-0.06
ステ
-0.06
:X
-0.06
false
-0.06
POSITIVE LOGITS
Mitsubishi
0.07
"/>
0.06
.Html
0.06
0.06
líb
0.06
THC
0.06
fü
0.06
_Core
0.06
çalışmalar
0.06
모든
0.06
Activations Density 0.031%