INDEX
    Explanations

    references to physical actions and movement in training contexts

    New Auto-Interp
    Negative Logits
    icont
    -0.15
     vess
    -0.15
    390
    -0.15
     اث
    -0.14
    untime
    -0.14
    .rmi
    -0.14
     vic
    -0.14
    fuse
    -0.14
    980
    -0.14
    ulti
    -0.14
    POSITIVE LOGITS
     Canter
    0.22
     rein
    0.22
     diagonal
    0.19
    hind
    0.18
     trot
    0.16
     halt
    0.16
     Trot
    0.16
    /bit
    0.16
     hal
    0.15
    igits
    0.15
    Act Density 0.022%

    No Known Activations