INDEX
    Explanations

    Correcting errors; "they meant"

    New Auto-Interp
    Negative Logits
     basic
    -0.08
     ਇੱਕ
    -0.08
    -0.08
    Declar
    -0.07
     બને
    -0.07
    /dialog
    -0.07
     мног
    -0.07
    atiem
    -0.07
     дело
    -0.07
     создан
    -0.07
    POSITIVE LOGITS
     errone
    0.12
     mistakenly
    0.11
     inaccur
    0.11
     incorrectly
    0.11
    Incorrect
    0.11
     misguided
    0.11
     unfortunately
    0.10
     wrongly
    0.10
    _wrong
    0.10
     supposedly
    0.10
    Act Density 0.067%

    No Known Activations