INDEX
    Explanations

    phrases indicating surprise or disbelief

    phrases expressing a sense of denial or lack of awareness

    New Auto-Interp
    Negative Logits
    idon
    -0.86
    rend
    -0.73
    abre
    -0.72
    ubi
    -0.69
    rog
    -0.69
    ijing
    -0.67
    osity
    -0.67
    orm
    -0.66
    center
    -0.65
    aim
    -0.65
    POSITIVE LOGITS
     remotely
    1.29
     bothered
    0.99
     bother
    0.97
     bothering
    0.95
     pretend
    0.88
     halfway
    0.79
     close
    0.78
     kidding
    0.78
     mention
    0.77
     hint
    0.77
    Act Density 0.049%

    No Known Activations