INDEX
    Explanations

    concepts related to race, identity, and societal issues

    New Auto-Interp
    Negative Logits
    ervo
    -0.17
     âĹĦ
    -0.16
    itant
    -0.15
     èŤ
    -0.14
    ìłĦìĹIJ
    -0.14
    cken
    -0.14
    arges
    -0.14
    uels
    -0.13
    ptal
    -0.13
    earn
    -0.13
    POSITIVE LOGITS
    åį«
    0.14
    .details
    0.14
     Bart
    0.14
     underst
    0.13
    кÑĢаÑĹ
    0.13
     Ton
    0.13
    ikan
    0.13
    bart
    0.13
    RSS
    0.13
    ucha
    0.12
    Act Density 0.039%

    No Known Activations