INDEX
    Explanations

    derogatory terms and insults related to people and groups

    insults and derogatory terms

    New Auto-Interp
    Negative Logits
    OGND
    -0.59
     okuyayım
    -0.45
    באנגלית
    -0.45
    ագրություններ
    -0.44
     africaine
    -0.44
    cstdio
    -0.42
    archiviato
    -0.42
     fédéral
    -0.41
    Hozzáférés
    -0.40
    お腹
    -0.40
    POSITIVE LOGITS
     idiots
    0.53
     idiot
    0.52
     morons
    0.51
    EDEFAULT
    0.50
     moron
    0.49
    tagHelper
    0.48
    randomUUID
    0.47
    Idiot
    0.45
     Idiot
    0.45
     scound
    0.45
    Act Density 0.051%

    No Known Activations