INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ума
    -0.07
     Nan
    -0.07
    /*----------------------------------------------------------------
    -0.07
     Angola
    -0.07
     WAR
    -0.06
     Lond
    -0.06
    _PAGES
    -0.06
     вій
    -0.06
    Push
    -0.06
     WM
    -0.06
    POSITIVE LOGITS
     correct
    0.18
     incorrect
    0.13
     correctly
    0.12
     Correct
    0.12
    Correct
    0.10
    correct
    0.10
    (correct
    0.08
     Incorrect
    0.08
    incorrect
    0.08
     incorrectly
    0.08
    Act Density 0.026%

    No Known Activations