INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     diffs
    -0.07
    (void
    -0.07
     Belgian
    -0.06
     fallen
    -0.06
    oku
    -0.06
    -0.06
     programmer
    -0.06
    -0.06
     greetings
    -0.06
    غ
    -0.06
    POSITIVE LOGITS
     tours
    0.07
     Los
    0.07
     cov
    0.06
    рай
    0.06
     motiv
    0.06
    filesize
    0.06
     piv
    0.06
     displaced
    0.06
    .handleError
    0.06
     exactly
    0.06
    Act Density 0.066%

    No Known Activations