INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Correct
    -0.07
     Cout
    -0.07
     CW
    -0.06
     Transitional
    -0.06
     mourn
    -0.06
     запах
    -0.06
    -0.06
    /repository
    -0.06
    Americ
    -0.06
    .ByteString
    -0.06
    POSITIVE LOGITS
     ability
    0.09
    _V
    0.08
    _bb
    0.07
     здат
    0.07
    0.07
    .est
    0.07
     adjusts
    0.07
     incapable
    0.06
    artner
    0.06
     قادر
    0.06
    Act Density 0.031%

    No Known Activations