INDEX
    Explanations

    critical online discussions

    The neuron fires on direct-address insults and profanity (e.g. “you,” “screw yourself,” “f*ck you”)—i.e. hostile or abusive second-person language.

    New Auto-Interp
    Negative Logits
    laughs
    -0.07
    Mon
    -0.07
    /includes
    -0.07
    โรค
    -0.06
     ListTile
    -0.06
     subclasses
    -0.06
     REC
    -0.06
     recognition
    -0.06
    Songs
    -0.06
    ادات
    -0.06
    POSITIVE LOGITS
    ResourceId
    0.06
     _{
    0.06
    0.06
    ULLET
    0.06
     cev
    0.06
     ولي
    0.06
     rotten
    0.06
     _:
    0.06
    ---
    ↵
    0.06
    ện
    0.06
    Act Density 0.064%

    No Known Activations