INDEX
    Explanations

    descriptive verbs that convey perception and appearances

    New Auto-Interp
    Negative Logits
    irl
    -0.06
    ntax
    -0.06
    ocale
    -0.06
    lis
    -0.06
     maint
    -0.06
    ç¶Ń
    -0.06
    anc
    -0.06
    zos
    -0.06
    gan
    -0.06
    esz
    -0.06
    POSITIVE LOGITS
     like
    0.28
     Like
    0.21
    Like
    0.20
    like
    0.19
     LIKE
    0.19
    _like
    0.18
     likes
    0.17
     như
    0.17
     wie
    0.17
    .like
    0.17
    Act Density 0.014%

    No Known Activations