INDEX
    Explanations

    This neuron activates on the word “paper,” flagging user requests for academic or informational papers.

    New Auto-Interp
    Negative Logits
    -control
    -0.07
     XCT
    -0.07
     Elite
    -0.07
    ut
    -0.07
    _DU
    -0.07
     immigrant
    -0.06
     joined
    -0.06
    achat
    -0.06
    UNT
    -0.06
     control
    -0.06
    POSITIVE LOGITS
     paper
    0.14
     papers
    0.13
     Paper
    0.12
    -paper
    0.10
     Papers
    0.10
    Paper
    0.10
    paper
    0.10
    aper
    0.10
     Back
    0.09
    apers
    0.08
    Act Density 0.016%

    No Known Activations