INDEX
    Explanations

    racist statements against black people.

    hate speech

    New Auto-Interp
    Negative Logits
    ArgsConstructor
    -0.72
    WaitGroup
    -0.68
     BorderRadius
    -0.63
     intptr
    -0.63
    saraba
    -0.60
     NSCoder
    -0.59
    WebVitals
    -0.58
    +#+#
    -0.58
     Normdatei
    -0.57
    ########.
    -0.57
    POSITIVE LOGITS
    */].
    0.55
    řské
    0.51
    Vanjske
    0.46
    mpä
    0.46
    (".");
    0.45
     Glaser
    0.45
    frey
    0.45
    0.45
     específicamente
    0.45
    Fré
    0.44
    Act Density 1.323%

    No Known Activations