INDEX
    Explanations

    commands or requests to stop doing something

    New Auto-Interp
    Negative Logits
    orthy
    -0.82
    ridge
    -0.77
    ocene
    -0.77
    eer
    -0.77
    ãĤĬ
    -0.73
    essee
    -0.72
    aceae
    -0.72
    Sov
    -0.71
    dds
    -0.71
    rocket
    -0.69
    POSITIVE LOGITS
     bothering
    1.38
     wasting
    1.22
     worrying
    1.19
     pretending
    1.14
     whining
    1.11
     messing
    1.10
     caring
    1.09
     behaving
    1.06
     talking
    1.02
     abusing
    1.02
    Act Density 0.041%

    No Known Activations