INDEX
    Explanations

    concepts of change and movement in various contexts

    New Auto-Interp
    Negative Logits
     upward
    -0.15
    atica
    -0.14
     outr
    -0.14
    ucht
    -0.14
    luv
    -0.13
    onn
    -0.13
    mina
    -0.13
    roke
    -0.13
    irl
    -0.13
    aida
    -0.13
    POSITIVE LOGITS
     towards
    0.64
     toward
    0.63
     away
    0.56
     Towards
    0.47
    Towards
    0.46
     Away
    0.44
     Tow
    0.43
    away
    0.42
     hacia
    0.40
    Away
    0.38
    Act Density 0.096%

    No Known Activations