INDEX
    Explanations

    expressions of perception or opinion

    New Auto-Interp
    Negative Logits
     themselves
    -0.16
    nev
    -0.14
    uest
    -0.14
    åħ¥ãĤĮ
    -0.14
     himself
    -0.14
    oes
    -0.13
     herself
    -0.13
    ÏĦοÏħÏĤ
    -0.13
    hte
    -0.13
     itself
    -0.13
    POSITIVE LOGITS
     like
    0.26
     Like
    0.22
     clear
    0.21
    Like
    0.20
    like
    0.19
    _like
    0.18
    .like
    0.18
     như
    0.18
     likes
    0.18
     LIKE
    0.18
    Act Density 0.035%

    No Known Activations