INDEX
Explanations
This neuron activates on the word “paper,” flagging user requests for academic or informational papers.
New Auto-Interp
Negative Logits
-control
-0.07
XCT
-0.07
Elite
-0.07
ut
-0.07
_DU
-0.07
immigrant
-0.06
joined
-0.06
achat
-0.06
UNT
-0.06
control
-0.06
POSITIVE LOGITS
paper
0.14
papers
0.13
Paper
0.12
-paper
0.10
Papers
0.10
Paper
0.10
paper
0.10
aper
0.10
Back
0.09
apers
0.08
Activations Density 0.016%