INDEX
    Explanations

    phrases with the word "no"

    negations and phrases expressing refusal or prohibition

    New Auto-Interp
    Negative Logits
    rn
    -0.79
    often
    -0.76
    typically
    -0.73
    turned
    -0.72
    mund
    -0.72
    then
    -0.70
    RAFT
    -0.69
    ellect
    -0.69
    nr
    -0.67
    olutely
    -0.67
    POSITIVE LOGITS
     excuses
    1.05
     exceptions
    1.00
     refunds
    0.95
    xious
    0.89
     regrets
    0.87
     compulsion
    0.85
     surprises
    0.83
     compromises
    0.83
     tolerance
    0.82
    isy
    0.82
    Act Density 0.099%

    No Known Activations