INDEX
    Explanations

    expressions of gratitude and appreciation

    New Auto-Interp
    Negative Logits
    û
    -0.14
    oler
    -0.14
    оÑĢод
    -0.13
    æĹıèĩªæ²»
    -0.13
    umm
    -0.12
    ourage
    -0.12
     toler
    -0.12
    toa
    -0.12
     же
    -0.12
     taj
    -0.12
    POSITIVE LOGITS
     thanks
    0.77
     thank
    0.77
     Thanks
    0.71
     Thank
    0.67
     THANK
    0.67
    thanks
    0.66
    Thanks
    0.66
    thank
    0.62
    Thank
    0.61
     gracias
    0.59
    Act Density 0.363%

    No Known Activations