INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     refreshed
    -0.07
     innocence
    -0.07
     Về
    -0.07
     Kul
    -0.07
    .Objects
    -0.07
     Priest
    -0.06
    Changing
    -0.06
     mindset
    -0.06
    .TextInput
    -0.06
     undertaken
    -0.06
    POSITIVE LOGITS
    DY
    0.08
    .getMinutes
    0.07
    ise
    0.07
     Gör
    0.07
    ekce
    0.06
    mask
    0.06
    early
    0.06
    _Tr
    0.06
    inston
    0.06
    ерів
    0.06
    Act Density 0.001%

    No Known Activations