INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     itu
    -0.08
     expedition
    -0.08
     comedy
    -0.08
    heb
    -0.08
     precious
    -0.08
     части
    -0.08
     الإد
    -0.07
    (es
    -0.07
     ave
    -0.07
    schaft
    -0.07
    POSITIVE LOGITS
     Admit
    0.09
    gin
    0.08
    791
    0.07
    Grow
    0.07
     Facial
    0.07
    假的
    0.07
     facial
    0.07
     awakened
    0.07
     مارچ
    0.07
     heated
    0.07
    Act Density 0.002%

    No Known Activations