INDEX
    Explanations

    negative sentiments or criticism towards others

    expressions of disdain or critique towards individuals or groups

    New Auto-Interp
    Negative Logits
    emale
    -0.83
    ieth
    -0.81
    cially
    -0.74
    winner
    -0.69
    ahon
    -0.66
    iverse
    -0.66
    urally
    -0.65
     detrim
    -0.64
     Impact
    -0.64
    rimination
    -0.63
    POSITIVE LOGITS
     concoct
    0.99
     indul
    0.94
     instinctively
    0.93
     obsessed
    0.89
     indulge
    0.89
     resorted
    0.86
     impuls
    0.86
     urge
    0.86
     craving
    0.84
     wandered
    0.84
    Act Density 0.538%

    No Known Activations