INDEX
    Explanations

    offensive word detection

    New Auto-Interp
    Negative Logits
    하거나
    0.46
    하면서
    0.46
    0.41
     монта
    0.40
     있다는
    0.40
    하지
    0.40
    Reached
    0.39
    kadot
    0.39
    0.39
    하지만
    0.39
    POSITIVE LOGITS
     Peralta
    0.45
     ethnic
    0.44
     redesigned
    0.43
     ajustes
    0.43
     patrols
    0.43
     peine
    0.43
     revamped
    0.42
     surveillance
    0.41
     zan
    0.41
     Ethnic
    0.41
    Act Density 0.001%

    No Known Activations