INDEX
    Explanations

    actions associated with movement or departure

    New Auto-Interp
    Negative Logits
    ahn
    -0.16
    rong
    -0.16
    itre
    -0.15
     تÙĤÙĪ
    -0.14
    nl
    -0.14
    738
    -0.14
    758
    -0.14
    nown
    -0.14
    glas
    -0.14
    dos
    -0.13
    POSITIVE LOGITS
     leaving
    0.30
     Leaving
    0.25
     leave
    0.22
     Leave
    0.20
     leaves
    0.20
     toward
    0.20
    Leave
    0.20
     headed
    0.19
     into
    0.19
     direction
    0.19
    Act Density 0.124%

    No Known Activations