INDEX
    Explanations

    references to harmful effects or substances

    New Auto-Interp
    Negative Logits
    PRNewswire
    -0.56
     mustache
    -0.48
     elegance
    -0.48
     baptism
    -0.48
    zwe
    -0.48
     trow
    -0.47
     acquittal
    -0.47
     mystère
    -0.47
    Itr
    -0.46
    IZABETH
    -0.46
    POSITIVE LOGITS
     harmful
    1.91
    Harmful
    1.84
     Harmful
    1.82
     injurious
    1.14
    有害
    1.09
    harm
    1.02
     dangerous
    1.01
     hurtful
    0.95
    dangerous
    0.94
     toxic
    0.91
    Act Density 0.012%

    No Known Activations