INDEX
    Explanations

    The neuron detects words signaling a polite or courteous tone (e.g., “politely”).

    New Auto-Interp
    Negative Logits
     fats
    -0.07
     BI
    -0.07
     ninh
    -0.06
    mins
    -0.06
     masturbation
    -0.06
     اك
    -0.06
    _send
    -0.06
     withStyles
    -0.06
     awareness
    -0.06
    .smtp
    -0.06
    POSITIVE LOGITS
     Published
    0.07
    aves
    0.07
    0.07
    }_{
    0.06
     Override
    0.06
    atıcı
    0.06
    offline
    0.06
    _Renderer
    0.06
    Contained
    0.06
     Investig
    0.06
    Act Density 0.019%

    No Known Activations