INDEX
    Explanations

    attracting attention

    The neuron fires on erotically charged or provocative language—words that highlight sexualized, attention-grabbing descriptions.

    New Auto-Interp
    Negative Logits
     Cait
    -0.07
     Hubb
    -0.07
     Eating
    -0.07
    كان
    -0.07
     Merr
    -0.06
    _rad
    -0.06
    iação
    -0.06
     measuring
    -0.06
    ると
    -0.06
    .Year
    -0.06
    POSITIVE LOGITS
    _WRONG
    0.06
    grupo
    0.06
     Freeze
    0.06
    -establish
    0.06
    .gb
    0.06
    _fix
    0.06
    рий
    0.06
        					
    0.06
    indows
    0.06
    endor
    0.06
    Act Density 0.027%

    No Known Activations