INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Burl
    -0.07
     نج
    -0.07
     경북
    -0.06
     كتب
    -0.06
     flatt
    -0.06
     endif
    -0.06
     <<
    -0.06
     Rit
    -0.06
    С
    -0.06
    -0.06
    POSITIVE LOGITS
    New
    0.12
     New
    0.11
     new
    0.09
     NEW
    0.07
    .CreateInstance
    0.06
     colleagues
    0.06
    hero
    0.06
     incompatible
    0.06
    IAL
    0.06
     anew
    0.06
    Act Density 0.006%

    No Known Activations