INDEX
    Explanations

    directional and movement-related language

    New Auto-Interp
    Negative Logits
    ElementException
    -0.18
    ÅĻÃŃd
    -0.16
    udd
    -0.15
    ingly
    -0.15
    uzey
    -0.14
    foy
    -0.14
    ëĬ
    -0.14
    gi
    -0.14
    ingo
    -0.13
    eniable
    -0.13
    POSITIVE LOGITS
    wards
    0.24
    ward
    0.21
     onto
    0.21
     into
    0.20
    WARD
    0.17
    onto
    0.16
    oward
    0.16
     towards
    0.15
     toward
    0.15
     Into
    0.15
    Act Density 0.158%

    No Known Activations