INDEX
    Explanations

    phrases indicating understanding or comprehension

    expressions of understanding or empathy

    New Auto-Interp
    Negative Logits
    âĢ
    -0.97
    âĢİ
    -0.86
    à¨
    -0.83
    à©
    -0.83
    æł
    -0.81
    ãĤ£
    -0.79
    è»
    -0.79
     âĢ
    -0.79
    etheus
    -0.76
     à¨
    -0.76
    POSITIVE LOGITS
     ;)
    1.12
     haha
    1.05
     :)
    1.03
    !?
    1.02
     :(
    1.01
    ?!
    1.01
    !
    0.99
    ...?
    0.91
     dude
    0.90
     anyways
    0.90
    Act Density 0.701%

    No Known Activations