INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     importance
    -0.07
    -bedroom
    -0.07
     знаход
    -0.06
    -нибудь
    -0.06
    amentals
    -0.06
    Interview
    -0.06
    -U
    -0.06
    одар
    -0.06
     Trail
    -0.06
    альному
    -0.06
    POSITIVE LOGITS
     yardım
    0.06
     zim
    0.06
     mek
    0.06
     dort
    0.06
    esser
    0.06
    گو
    0.06
    authorized
    0.06
    como
    0.06
     döneminde
    0.06
    de
    0.06
    Act Density 0.034%

    No Known Activations