INDEX
    Explanations

    This neuron primarily detects occurrences of the word “sex.”

    New Auto-Interp
    Negative Logits
    oub
    -0.07
     Soup
    -0.06
     Address
    -0.06
    -0.06
    _m
    -0.06
    -back
    -0.06
    ेल
    -0.06
    -sm
    -0.06
     NETWORK
    -0.06
    shape
    -0.06
    POSITIVE LOGITS
     Cherokee
    0.07
     tienes
    0.07
     hybrids
    0.07
    (register
    0.07
    бо
    0.06
     cevap
    0.06
     Atatürk
    0.06
     fuck
    0.06
     demonstr
    0.06
     alleging
    0.06
    Act Density 0.017%

    No Known Activations