INDEX
    Explanations

    phrases indicating movement or direction

    New Auto-Interp
    Negative Logits
     out
    -0.08
    odore
    -0.08
    adesh
    -0.08
    nite
    -0.07
    ãĢħ
    -0.07
     up
    -0.07
    ily
    -0.07
    kup
    -0.07
    ìĸ´ëĤĺ
    -0.07
    -être
    -0.07
    POSITIVE LOGITS
    /down
    0.12
    wards
    0.10
    /off
    0.08
    /on
    0.08
    WARDS
    0.07
    /out
    0.07
    sert
    0.07
    datable
    0.07
    pers
    0.07
    ensively
    0.06
    Act Density 0.064%

    No Known Activations