INDEX
    Explanations

    words related to racism and discrimination

    New Auto-Interp
    Negative Logits
    Nuorodos
    -0.60
    **********/
    -0.53
     Kön
    -0.52
    uttosto
    -0.51
     svolge
    -0.50
    Debido
    -0.50
    ...');
    -0.50
    Economía
    -0.49
     meras
    -0.48
     citroen
    -0.48
    POSITIVE LOGITS
     racism
    1.03
     racist
    0.96
     Racism
    0.93
    Racism
    0.91
    racist
    0.77
    racism
    0.76
     racial
    0.76
     Rac
    0.76
     shewn
    0.73
     Racial
    0.73
    Act Density 0.059%

    No Known Activations