INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ocen
    -0.09
     těch
    -0.09
     ضد
    -0.08
     tohoto
    -0.08
     chis
    -0.08
     jsou
    -0.08
    ేత
    -0.08
    ాల
    -0.08
     against
    -0.08
    ుష
    -0.08
    POSITIVE LOGITS
     excerpt
    0.09
     vitamin
    0.08
    изи
    0.08
     Vocabulary
    0.08
    0.08
     vocab
    0.07
    圖片
    0.07
    (sentence
    0.07
     extract
    0.07
     paragraph
    0.07
    Act Density 0.081%

    No Known Activations