INDEX
    Explanations

    words related to actions or commands directed towards a specific person or group

    verbs indicating action or commands

    New Auto-Interp
    Negative Logits
    mith
    -0.64
    SPONSORED
    -0.63
    printf
    -0.58
     constitu
    -0.57
     Mehran
    -0.56
    dfx
    -0.56
    ]=
    -0.55
     disg
    -0.54
     spons
    -0.53
    cv
    -0.53
    POSITIVE LOGITS
     Yourself
    1.39
     yourself
    1.24
    ments
    1.16
     Your
    1.15
     your
    1.14
     yourselves
    1.14
    ings
    1.07
    ment
    0.97
    able
    0.96
    ables
    0.95
    Act Density 0.301%

    No Known Activations