INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     testified
    -0.07
     групп
    -0.07
     deceive
    -0.07
     Cycling
    -0.07
    222
    -0.06
     nikdo
    -0.06
     muž
    -0.06
     усіх
    -0.06
     actress
    -0.06
    Boolean
    -0.06
    POSITIVE LOGITS
     atrib
    0.07
     rozší
    0.07
    MES
    0.07
     Ada
    0.07
    ////////////////////////////////
    0.06
    لط
    0.06
    podob
    0.06
     VA
    0.06
     úprav
    0.06
    ADMIN
    0.06
    Act Density 0.076%

    No Known Activations