INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     whom
    -0.08
    arnation
    -0.07
    oci
    -0.06
     SCC
    -0.06
     simplex
    -0.06
     необходимости
    -0.06
     законом
    -0.06
     hvis
    -0.06
     wäre
    -0.06
    dio
    -0.06
    POSITIVE LOGITS
     أغ
    0.07
     Civilization
    0.07
    이나
    0.06
     parchment
    0.06
    quoise
    0.06
    neh
    0.06
    ुध
    0.06
     NSW
    0.06
    Sept
    0.06
    0.06
    Act Density 0.026%

    No Known Activations