INDEX
    Explanations

    negative words and sentiments related to emotions or conflicts

    negative connotations or expressions of disgust

    New Auto-Interp
    Negative Logits
     trained
    -0.70
     individually
    -0.69
    imately
    -0.68
    terday
    -0.67
     silenced
    -0.66
     deceived
    -0.66
    theless
    -0.65
     probable
    -0.65
     curiously
    -0.64
     misled
    -0.64
    POSITIVE LOGITS
    ocations
    1.02
    aution
    1.01
    tones
    0.97
    eness
    0.90
    ptions
    0.89
    isons
    0.89
    usions
    0.88
    isms
    0.87
    otypes
    0.87
    ifts
    0.87
    Act Density 0.282%

    No Known Activations