INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     chiếm
    -0.07
    ái
    -0.07
    -0.06
                                                                             
    -0.06
     theater
    -0.06
     eaten
    -0.06
     Wiki
    -0.06
     enactment
    -0.06
    ích
    -0.06
     Eaton
    -0.06
    POSITIVE LOGITS
     sure
    0.13
     Sure
    0.08
    Sure
    0.08
    -unstyled
    0.07
    sure
    0.07
     مستقیم
    0.07
     haircut
    0.07
    mul
    0.07
     clear
    0.07
     hamburg
    0.07
    Act Density 0.019%

    No Known Activations