INDEX
    Explanations

    The neuron activates on mentions of romantic or sexual scandals—particularly references to “affair” and associated context.

    New Auto-Interp
    Negative Logits
     قابلیت
    -0.07
     fees
    -0.07
     ilaç
    -0.07
    Es
    -0.06
     Peace
    -0.06
    -0.06
    imizde
    -0.06
    soon
    -0.06
    ساس
    -0.06
     Ao
    -0.06
    POSITIVE LOGITS
    (relative
    0.06
    admins
    0.06
    0.06
     picked
    0.06
     exist
    0.06
     Rica
    0.06
    ijk
    0.06
     comfy
    0.06
     pretty
    0.06
    _coordinate
    0.06
    Act Density 0.024%

    No Known Activations