INDEX
    Explanations

    The neuron detects mentions of skin‐related descriptors—especially the words “skin,” “tone,” or “type.”

    New Auto-Interp
    Negative Logits
     wildfires
    -0.07
    oter
    -0.07
    ignant
    -0.07
     Adult
    -0.07
     volunteering
    -0.07
    ोर
    -0.07
     conservatives
    -0.06
     volunteer
    -0.06
     celebrity
    -0.06
    edu
    -0.06
    POSITIVE LOGITS
    =__
    0.07
    ‐‐
    0.06
    nombre
    0.06
    '/>↵
    0.06
    -CN
    0.06
    학생
    0.06
     hlub
    0.06
    /ay
    0.06
    CGFloat
    0.06
     витами
    0.06
    Act Density 0.056%

    No Known Activations