INDEX
    Explanations

    expressions of liking or affection

    New Auto-Interp
    Negative Logits
     Asher
    -0.75
    èvre
    -0.72
     Hernandez
    -0.71
     Pennington
    -0.68
     Berman
    -0.68
     Crowe
    -0.67
    <h6>
    -0.65
    шель
    -0.63
    codegen
    -0.63
     din
    -0.62
    POSITIVE LOGITS
     liked
    0.96
     Liked
    0.92
    dislike
    0.88
     Likes
    0.86
    Likes
    0.84
     gusta
    0.84
     Lik
    0.81
     dislike
    0.81
     liking
    0.78
    👍👍
    0.77
    Act Density 0.049%

    No Known Activations