INDEX
    Explanations

    words related to negative consequences or adverse effects

    phrases that indicate causation or harmful effects

    New Auto-Interp
    Negative Logits
     Technique
    -0.73
    ramid
    -0.68
     Niet
    -0.67
     skelet
    -0.65
    halla
    -0.65
    aeper
    -0.65
     motto
    -0.64
    atu
    -0.63
    ian
    -0.63
    stra
    -0.61
    POSITIVE LOGITS
     havoc
    1.12
     cele
    0.93
     irre
    0.85
    ãĥĨãĤ£
    0.84
     mayhem
    0.81
     trouble
    0.81
     headaches
    0.80
    parable
    0.79
     unnecessary
    0.74
    hift
    0.74
    Act Density 0.043%

    No Known Activations