INDEX
    Explanations

    discussions about social inequality and racism

    New Auto-Interp
    Negative Logits
    -span
    -0.18
     Spokane
    -0.17
     Span
    -0.16
     Henrik
    -0.15
    Spain
    -0.15
     spans
    -0.15
     Swal
    -0.15
     Sikh
    -0.14
    ucson
    -0.14
    ø
    -0.14
    POSITIVE LOGITS
     Brazilian
    0.55
     Brazil
    0.54
    Brazil
    0.48
     Braz
    0.48
     Sao
    0.46
     Brasil
    0.45
     Bras
    0.45
     brazil
    0.44
     São
    0.44
     Rio
    0.40
    Act Density 0.137%

    No Known Activations