INDEX
    Explanations

    verbs indicating change or transformation

    New Auto-Interp
    Negative Logits
    ual
    -0.17
    otre
    -0.15
     handed
    -0.15
    ari
    -0.15
    ehr
    -0.14
    074
    -0.14
    steller
    -0.14
     comm
    -0.14
    ary
    -0.14
    UAL
    -0.14
    POSITIVE LOGITS
    ãĤŃãĥ¥
    0.17
     previously
    0.16
    dden
    0.16
    haar
    0.15
    aines
    0.15
    kj
    0.15
    agos
    0.14
     earlier
    0.14
    ÑĢеÑģ
    0.14
    ainen
    0.14
    Act Density 0.468%

    No Known Activations