INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     justified
    -0.07
    ////////////
    -0.07
     regret
    -0.07
     intensive
    -0.07
     defend
    -0.06
    _double
    -0.06
     itr
    -0.06
    (it
    -0.06
     tests
    -0.06
     invitation
    -0.06
    POSITIVE LOGITS
    0.07
    _PHYS
    0.07
    categorias
    0.06
    ERİ
    0.06
     безопас
    0.06
    <location
    0.06
    .Country
    0.06
    Telefono
    0.06
    opy
    0.06
     эффек
    0.06
    Act Density 0.013%

    No Known Activations