INDEX
    Explanations

    phrases indicating a certainty or conclusive statement

    phrases indicating uncertainty or questioning

    New Auto-Interp
    Negative Logits
    iliated
    -0.66
    ourses
    -0.63
     seiz
    -0.63
     confir
    -0.62
    ourse
    -0.61
    pread
    -0.61
     tremend
    -0.60
     undermin
    -0.59
    etheus
    -0.58
    illary
    -0.57
    POSITIVE LOGITS
     ;)
    0.95
     ðŁĺ
    0.79
     :)
    0.78
     :-)
    0.78
     haha
    0.78
    â̦)
    0.75
     ðŁĻĤ
    0.75
    â̦.
    0.73
    ?!
    0.71
    !
    0.71
    Act Density 0.231%

    No Known Activations