INDEX
    Explanations

    negative statements or contrasts

    negative phrases and concepts regarding evidence and social issues

    New Auto-Interp
    Negative Logits
    igm
    -0.77
    ipeg
    -0.74
    ovember
    -0.66
     Nare
    -0.65
    iatus
    -0.64
    WN
    -0.64
    uden
    -0.63
    affe
    -0.62
     Shan
    -0.62
    Notes
    -0.61
    POSITIVE LOGITS
    asso
    0.69
    cause
    0.66
     decency
    0.64
     clot
    0.63
     slightest
    0.63
     la
    0.61
    ilings
    0.60
     hypocr
    0.59
     sensit
    0.57
     righteousness
    0.57
    Act Density 0.261%

    No Known Activations