INDEX
    Explanations

    emojis and exclamations

    markers of heightened expressiveness and chat turn boundaries, such as exclamatory punctuation, emojis, and end-of-turn tokens.

    New Auto-Interp
    Negative Logits
    M
    0.68
    ни
    0.67
    ும்
    0.64
    V
    0.61
    0.61
    не
    0.61
    ला
    0.59
    εια
    0.58
    T
    0.58
    B
    0.57
    POSITIVE LOGITS
     👋
    0.57
     The
    0.54
     😉
    0.53
    é
    0.53
     😀
    0.52
     You
    0.50
    0.50
     
    0.50
     Ве
    0.48
     🙌
    0.48
    Act Density 0.462%

    No Known Activations