INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     undertook
    -1.37
     knew
    -1.36
     wrote
    -1.29
    はなく
    -1.23
    已经
    -1.22
     threw
    -1.22
    始めた
    -1.21
     stole
    -1.18
     confirmación
    -1.17
     withdrew
    -1.16
    POSITIVE LOGITS
     taken
    1.45
     kept
    1.27
     deceit
    1.11
     lahko
    1.10
     capaces
    1.09
     גע
    1.08
     eaten
    1.06
     Terrasse
    1.03
     subjected
    1.03
     donnant
    1.02
    Act Density 0.355%

    No Known Activations