INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Birth
    -0.07
     insulting
    -0.07
    —who
    -0.07
    —that
    -0.07
    *-
    -0.07
     theater
    -0.07
     collaborations
    -0.06
     Sheldon
    -0.06
    )”
    -0.06
     sincerity
    -0.06
    POSITIVE LOGITS
     našeho
    0.07
    (mean
    0.07
    ному
    0.06
    (reordered
    0.06
     мож
    0.06
    busy
    0.06
    اهد
    0.06
    lrt
    0.06
     zástup
    0.06
    ­i
    0.06
    Act Density 0.102%

    No Known Activations