INDEX
    Explanations

    negatively connotated terms or contexts

    New Auto-Interp
    Negative Logits
     Nicarag
    -0.74
     Manson
    -0.67
    ciating
    -0.67
     Revis
    -0.66
     Flores
    -0.63
     sodium
    -0.61
     dispatch
    -0.60
    selage
    -0.60
     extrem
    -0.60
     Bosnia
    -0.60
    POSITIVE LOGITS
    share
    1.08
    rate
    1.05
    out
    1.01
    through
    0.99
    along
    0.99
    away
    0.99
    cation
    0.99
    atten
    0.98
    outs
    0.98
    cycle
    0.96
    Act Density 0.057%

    No Known Activations