INDEX
    Explanations

    demographics

    The neuron detects words and phrases explicitly referring to race or racial attributes (e.g., “race,” “racial,” “skin color”).

    New Auto-Interp
    Negative Logits
     challenger
    -0.08
    otted
    -0.07
    rnd
    -0.07
    ุค
    -0.07
    -ground
    -0.07
     против
    -0.07
    Trading
    -0.06
    	TR
    -0.06
     caves
    -0.06
    Animation
    -0.06
    POSITIVE LOGITS
    (utf
    0.07
     العن
    0.06
    .transfer
    0.06
    (IServiceCollection
    0.06
     برگزار
    0.06
    -main
    0.06
    แห
    0.06
     ByteArrayInputStream
    0.06
     huyện
    0.06
    /MPL
    0.06
    Act Density 0.030%

    No Known Activations