INDEX
Explanations
The neuron activates on contrastive discourse markers—words like “but” or “however” that introduce a counterpoint or exception.
New Auto-Interp
Negative Logits
blem
-0.07
Weekly
-0.07
aphore
-0.07
erti
-0.06
่าย
-0.06
AssertionError
-0.06
Laurie
-0.06
_imm
-0.06
Problem
-0.06
یل
-0.06
POSITIVE LOGITS
AA
0.08
зд
0.07
pierced
0.06
Trends
0.06
_AN
0.06
rotate
0.06
S
0.06
PEND
0.06
Крас
0.06
↵
0.06
Activations Density 0.033%