INDEX
    Explanations

    terms related to skin color and racial identity

    New Auto-Interp
    Negative Logits
    ius
    -0.19
    ël
    -0.15
    uren
    -0.15
     Ú¯ÙĪ
    -0.15
    ador
    -0.14
    annis
    -0.14
    reten
    -0.14
    itas
    -0.14
    лам
    -0.14
    .Interop
    -0.13
    POSITIVE LOGITS
    ella
    0.15
    culos
    0.14
    atty
    0.14
    vail
    0.14
    audi
    0.14
    boom
    0.14
    ipo
    0.13
    cken
    0.13
    tplib
    0.13
    atter
    0.13
    Act Density 0.012%

    No Known Activations