INDEX
    Explanations

    The neuron activates on mentions of anger or aggressive/emotional hostility (e.g. “anger,” “angry,” “aggression”).

    New Auto-Interp
    Negative Logits
     ebook
    -0.07
     follic
    -0.07
    	    
    -0.07
    702
    -0.07
     touted
    -0.07
     نو
    -0.06
    experience
    -0.06
     coincide
    -0.06
    44
    -0.06
    15
    -0.06
    POSITIVE LOGITS
     anger
    0.12
     angry
    0.10
     rage
    0.07
     Angry
    0.07
     άλ
    0.07
    ující
    0.07
    _KERNEL
    0.07
    .opacity
    0.06
    PY
    0.06
     geniş
    0.06
    Act Density 0.007%

    No Known Activations