INDEX
    Explanations

    arguments and debate

    This neuron detects the words “straw” and “man,” i.e. occurrences of the phrase “straw man.”

    New Auto-Interp
    Negative Logits
    查看
    -0.07
     screenshots
    -0.07
    //================================================
    -0.06
    자동
    -0.06
     flotation
    -0.06
    .Threading
    -0.06
    Identity
    -0.06
     چشم
    -0.06
    ritional
    -0.06
    ('../
    -0.06
    POSITIVE LOGITS
    uem
    0.07
    0.06
     ges
    0.06
    ável
    0.06
     burglary
    0.06
     sigu
    0.06
     constitu
    0.06
    paren
    0.06
    ATORY
    0.06
     rb
    0.06
    Act Density 0.056%

    No Known Activations