INDEX
    Explanations

    terms related to toxicity and its measurement

    New Auto-Interp
    Negative Logits
    CrossRef
    -0.61
    jména
    -0.60
     arcos
    -0.58
    catore
    -0.57
     coste
    -0.56
    mote
    -0.56
     kleid
    -0.55
    bership
    -0.55
    れて
    -0.55
    cheid
    -0.55
    POSITIVE LOGITS
    toxicity
    1.69
     تانيه
    0.82
    \{\\
    0.80
    enumi
    0.80
     **/
    
    0.72
     BoxDecoration
    0.70
     pinulongan
    0.70
     Koran
    0.70
     muualla
    0.68
     DialogInterface
    0.67
    Act Density 0.021%

    No Known Activations