INDEX
    Explanations

    expressions of excitement or enthusiasm

    New Auto-Interp
    Negative Logits
     = 
    -0.91
    ]-->
    -0.74
    ;-)
    -0.73
     ;-)
    -0.72
    :-)
    -0.69
     Arxivat
    -0.67
     
    -0.65
    AnchorStyles
    -0.65
     CHtml
    -0.63
    (^_^;)
    -0.62
    POSITIVE LOGITS
     idk
    0.87
     🥺
    0.84
     tbh
    0.83
     ngl
    0.83
     Idk
    0.81
     🥲
    0.77
     lmao
    0.76
    Idk
    0.75
    🥺
    0.74
     😭
    0.72
    Act Density 0.122%

    No Known Activations