INDEX
    Explanations

    actions indicating change or movement

    New Auto-Interp
    Negative Logits
     katika
    -0.59
     nella
    -0.53
    -0.52
    -0.52
     kwenye
    -0.46
     nell
    -0.46
     trong
    -0.43
     presso
    -0.42
    aronder
    -0.41
     nelle
    -0.41
    POSITIVE LOGITS
     IN
    0.82
    in
    0.79
    进来
    0.73
    getIn
    0.68
     in
    0.66
     inn
    0.66
    IN
    0.66
     ins
    0.64
    进去
    0.63
     In
    0.62
    Act Density 0.347%

    No Known Activations