INDEX
    Explanations

    words related to communication with emphasis or affirmation

    conversational phrases that express humor or sarcasm

    New Auto-Interp
    Negative Logits
    pires
    -0.73
    minster
    -0.66
    Interestingly
    -0.63
    ³
    -0.62
    abase
    -0.61
    Hack
    -0.60
    Result
    -0.60
    aneous
    -0.60
    lier
    -0.59
    Page
    -0.59
    POSITIVE LOGITS
     please
    1.02
    _-
    1.00
     lest
    1.00
     unless
    0.99
     because
    0.89
    unless
    0.87
     they
    0.84
     it
    0.83
     we
    0.83
     otherwise
    0.82
    Act Density 0.267%

    No Known Activations