INDEX
    Explanations

    Disgusting texts

    the main thing this neuron does is spot strongly negative, disgust-evoking adjectives (e.g. “revolting,” “disgusting,” “gross”)

    New Auto-Interp
    Negative Logits
     buy
    -0.07
     stud
    -0.07
     map
    -0.06
     head
    -0.06
     Fit
    -0.06
     spr
    -0.06
     fontWeight
    -0.06
    filtr
    -0.06
    .xxx
    -0.06
    вар
    -0.06
    POSITIVE LOGITS
     exhilar
    0.06
    _<
    0.06
     DPI
    0.06
    年の
    0.06
    iverz
    0.06
    .Grid
    0.06
    /article
    0.06
    )"↵
    0.06
    UBLIC
    0.06
    corev
    0.06
    Act Density 0.132%

    No Known Activations