INDEX
    Explanations

    instances of strong language or racial and ethnic slurs

    New Auto-Interp
    Negative Logits
    orgh
    -0.18
     Wilkinson
    -0.14
    rix
    -0.14
    istringstream
    -0.14
    hare
    -0.14
    850
    -0.14
    957
    -0.13
    jezd
    -0.13
    ikki
    -0.13
    ores
    -0.13
    POSITIVE LOGITS
     prof
    0.49
     curse
    0.44
     swear
    0.41
     swearing
    0.40
     curs
    0.39
     curses
    0.36
     Prof
    0.36
    prof
    0.34
     Curse
    0.34
     obsc
    0.34
    Act Density 0.135%

    No Known Activations