INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     firms
    -0.07
     repo
    -0.07
    +w
    -0.07
     watts
    -0.07
     neighborhoods
    -0.06
    -wife
    -0.06
     blogs
    -0.06
    files
    -0.06
    -na
    -0.06
    -0.06
    POSITIVE LOGITS
     bitir
    0.07
     billig
    0.07
     nackte
    0.06
     Все
    0.06
    ecek
    0.06
    料無料
    0.06
     Рез
    0.06
     Mic
    0.06
     suited
    0.06
    Consulta
    0.06
    Act Density 0.004%

    No Known Activations