INDEX
    Explanations

    expressions indicating perception or observation

    New Auto-Interp
    Negative Logits
     supposedly
    -0.19
    esian
    -0.15
    itol
    -0.15
    uner
    -0.14
    readcr
    -0.14
     presumably
    -0.14
    quets
    -0.14
    quet
    -0.14
    ialis
    -0.13
     aspiring
    -0.13
    POSITIVE LOGITS
    ingly
    0.29
    lessly
    0.28
     like
    0.28
     intent
    0.21
     likes
    0.19
    intent
    0.18
     Like
    0.17
    like
    0.17
    Like
    0.17
     liked
    0.17
    Act Density 0.040%

    No Known Activations