INDEX
    Explanations

    sentences containing phrases related to criticism or negative evaluation

    phrases indicating negative impacts or consequences

    New Auto-Interp
    Negative Logits
    gency
    -0.74
    ugu
    -0.73
    cu
    -0.71
    ory
    -0.69
    apter
    -0.69
     Closing
    -0.63
    uther
    -0.60
    monton
    -0.60
    gery
    -0.60
     veil
    -0.60
    POSITIVE LOGITS
     also
    0.96
     downright
    0.93
    also
    0.85
     ALSO
    0.83
     actively
    0.82
     secondly
    0.76
    Secondly
    0.76
     strategically
    0.73
    cially
    0.69
    DES
    0.69
    Act Density 0.106%

    No Known Activations