INDEX
    Explanations

    expressions and phrases related to appearances or descriptions

    New Auto-Interp
    Negative Logits
    MessageTagHelper
    -0.68
    Similar
    -0.66
    similar
    -0.61
     Similar
    -0.61
     faſt
    -0.57
    Like
    -0.56
     Like
    -0.56
    как
    -0.55
    LIKE
    -0.55
    unlike
    -0.54
    POSITIVE LOGITS
     liked
    0.73
     likes
    0.70
     liking
    0.67
     lượt
    0.63
     lie
    0.60
    تقاوى
    0.56
     li
    0.54
     linke
    0.52
    喜歡
    0.49
    喜欢
    0.48
    Act Density 0.164%

    No Known Activations