INDEX
    Explanations

    references to stereotypes and biases in various contexts

    New Auto-Interp
    Negative Logits
    uu
    -0.15
    ç§Ģ
    -0.15
    asive
    -0.15
    lier
    -0.15
    liers
    -0.15
    cl
    -0.15
     Grove
    -0.15
    aned
    -0.15
    urf
    -0.14
     Integrity
    -0.14
    POSITIVE LOGITS
    apse
    0.14
    isini
    0.14
     Fay
    0.14
    éĺħ
    0.14
    ç®
    0.13
     Caps
    0.13
    \Entities
    0.13
     HOLDERS
    0.13
    ==>
    0.13
    ин
    0.13
    Act Density 0.044%

    No Known Activations