INDEX
    Explanations

    phrases indicating direction or movement

    New Auto-Interp
    Negative Logits
    ulton
    -0.18
     Sist
    -0.15
     Dias
    -0.15
    eft
    -0.15
    dn
    -0.15
    imest
    -0.15
    ths
    -0.14
    engu
    -0.14
     dias
    -0.14
    illon
    -0.13
    POSITIVE LOGITS
    icens
    0.15
    isay
    0.15
    anka
    0.15
    terr
    0.14
    INUX
    0.14
    778
    0.14
    eÄį
    0.14
    923
    0.14
    inous
    0.13
    508
    0.13
    Act Density 0.051%

    No Known Activations