INDEX
    Explanations

    mixed content/languages

    The neuron strongly activates on parentheses—especially the “(” token (and to a lesser extent “)” and the end‐of‐text marker)—i.e. it’s detecting parenthetical asides.

    responses that advocate for respectful communication regarding weight and body image.

    New Auto-Interp
    Negative Logits
     marc
    -0.07
    .ent
    -0.06
     Webcam
    -0.06
    session
    -0.06
    (on
    -0.06
     XL
    -0.06
     đồ
    -0.06
    хови
    -0.06
    =models
    -0.06
    .element
    -0.06
    POSITIVE LOGITS
    Japan
    0.07
    สาม
    0.06
     अख
    0.06
    ÜR
    0.06
     postpon
    0.06
    .Undef
    0.06
    ouden
    0.06
    ساس
    0.06
    (char
    0.06
    Pokemon
    0.06
    Act Density 0.012%

    No Known Activations