INDEX
    Explanations

    phrases indicating a type or category, often associated with subjective descriptions

    New Auto-Interp
    Negative Logits
    chg
    -0.17
    еÑĢÑĪ
    -0.17
    utter
    -0.16
    imizer
    -0.16
    uat
    -0.14
     POSSIBILITY
    -0.14
    imuth
    -0.14
    ÑĤÑİ
    -0.14
    remely
    -0.14
     distract
    -0.13
    POSITIVE LOGITS
     like
    0.21
    -sort
    0.21
     semi
    0.16
    antity
    0.16
     Like
    0.16
    thing
    0.16
    -ÑĤаки
    0.15
     tw
    0.15
     LIKE
    0.15
    /s
    0.15
    Act Density 0.031%

    No Known Activations