INDEX
    Explanations

    phrases indicating action or movement

    New Auto-Interp
    Negative Logits
    ntag
    -0.19
    rana
    -0.17
    pok
    -0.16
    rac
    -0.15
    ivil
    -0.15
    esk
    -0.14
    ippi
    -0.14
    ahoma
    -0.14
    opor
    -0.14
    ndl
    -0.14
    POSITIVE LOGITS
     solo
    0.17
    .cloudflare
    0.17
    @Spring
    0.16
    ef
    0.16
    Convention
    0.15
    erness
    0.15
    endum
    0.15
     ìŀij
    0.15
    они
    0.15
    alone
    0.14
    Act Density 0.256%

    No Known Activations