INDEX
    Explanations

    phrases indicating significant negative impact or consequence

    negative impacts and their effects on various groups or entities

    New Auto-Interp
    Negative Logits
    NESS
    -0.69
    wine
    -0.67
     mutated
    -0.66
     whore
    -0.63
    fw
    -0.62
     atro
    -0.60
     supplied
    -0.60
    cles
    -0.59
     rods
    -0.58
    qi
    -0.58
    POSITIVE LOGITS
     morale
    0.77
     welf
    0.69
    ãĤ®
    0.66
     campaigners
    0.64
     whistle
    0.62
     whistlebl
    0.62
     Palestin
    0.62
    ibaba
    0.62
    vae
    0.62
    warts
    0.62
    Act Density 0.182%

    No Known Activations