INDEX
    Explanations

    Before and after states

    New Auto-Interp
    Negative Logits
    uttle
    -0.08
    ahlen
    -0.07
    ................
    -0.07
     apolog
    -0.07
    .Up
    -0.07
     Daniels
    -0.07
    备用
    -0.07
    udin
    -0.07
     الهواء
    -0.07
     ofte
    -0.07
    POSITIVE LOGITS
     oorspronkelijke
    0.13
     ursprüng
    0.12
     originally
    0.12
    Originally
    0.12
     originalmente
    0.11
     Originally
    0.10
     oorspronk
    0.10
     existed
    0.10
    Were
    0.10
     sebelumnya
    0.10
    Act Density 0.050%

    No Known Activations