INDEX
    Explanations

    The neuron flags explicit “give a fuck”-style phrases—that is, profanity expressing an attitude of not caring.

    New Auto-Interp
    Negative Logits
    -0.07
     trick
    -0.06
    MAR
    -0.06
     нак
    -0.06
    ا�
    -0.06
    	       
    -0.06
     Với
    -0.06
    @↵↵
    -0.06
    _ASS
    -0.06
    Stan
    -0.06
    POSITIVE LOGITS
     EVENT
    0.07
     gelen
    0.07
     gesch
    0.07
     Mountains
    0.07
    Memo
    0.06
    wij
    0.06
     Scientific
    0.06
    _handle
    0.06
    localObject
    0.06
     Tüm
    0.06
    Act Density 0.003%

    No Known Activations