INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     zaht
    -0.08
    ’avance
    -0.08
    attack
    -0.08
    .kind
    -0.08
    Attack
    -0.07
     gear
    -0.07
    .aw
    -0.07
     hợp
    -0.07
    الم
    -0.07
    _fe
    -0.07
    POSITIVE LOGITS
     Ordinary
    0.08
     nový
    0.08
     effected
    0.08
    0.08
     Ordering
    0.08
    unic
    0.07
     carving
    0.07
     Новый
    0.07
     vivem
    0.07
     commuters
    0.07
    Act Density 0.006%

    No Known Activations