INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    in
    -0.07
    pred
    -0.06
     Toplam
    -0.06
    _library
    -0.06
    (expression
    -0.06
     vertex
    -0.06
     suggests
    -0.06
     democratic
    -0.06
    utar
    -0.06
    šel
    -0.06
    POSITIVE LOGITS
     anus
    0.07
     Pas
    0.07
    озя
    0.07
     напря
    0.07
     нак
    0.06
     Palo
    0.06
    0.06
    0.06
     Rowling
    0.06
     blackjack
    0.06
    Act Density 0.006%

    No Known Activations