INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ,但是
    -0.08
     considerable
    -0.07
    isos
    -0.07
    undos
    -0.07
    -0.07
    theros
    -0.07
     imper
    -0.07
    road
    -0.07
     peril
    -0.07
    atoa
    -0.07
    POSITIVE LOGITS
     NOTE
    0.09
     Melissa
    0.09
     Alm
    0.08
     Dar
    0.08
     การ
    0.08
     Meh
    0.07
     ә
    0.07
    0.07
     Chen
    0.07
    ayin
    0.07
    Act Density 0.277%

    No Known Activations