INDEX
    Explanations

    phrases containing derogatory language and offensive remarks

    New Auto-Interp
    Negative Logits
     vogli
    -0.74
     rispond
    -0.72
     dimenti
    -0.69
     desideri
    -0.67
     trovo
    -0.67
     trovi
    -0.64
     credere
    -0.64
     auguri
    -0.63
     vedi
    -0.63
     voleva
    -0.63
    POSITIVE LOGITS
     insults
    0.67
     insult
    0.60
     insulting
    0.59
     remarks
    0.58
     hurled
    0.56
     remark
    0.53
     verbally
    0.53
     derogatory
    0.51
     uttered
    0.51
     comments
    0.50
    Act Density 0.435%

    No Known Activations