INDEX
    Explanations

    positive phrases and statements

    expressions of positive emotions and sentiments

    New Auto-Interp
    Negative Logits
    failed
    -0.77
     censored
    -0.71
    cens
    -0.71
     incompet
    -0.69
     Restrict
    -0.69
     antagonists
    -0.68
     offending
    -0.67
     delinquent
    -0.67
     redacted
    -0.67
     obscure
    -0.67
    POSITIVE LOGITS
     appreciated
    0.97
     compliment
    0.97
     welcome
    0.97
     compliments
    0.97
     gratitude
    0.97
     grateful
    0.95
     commend
    0.95
     delighted
    0.95
     positive
    0.93
     invaluable
    0.93
    Act Density 1.647%

    No Known Activations