INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     embarrassing
    -0.07
     Herr
    -0.07
     Democracy
    -0.07
     ler
    -0.07
    <br
    -0.07
     Francisco
    -0.07
     Autism
    -0.07
     perpendicular
    -0.07
     vocabulary
    -0.07
     karşı
    -0.06
    POSITIVE LOGITS
    0.07
    ATING
    0.07
    0.07
    GLenum
    0.07
    شن
    0.07
    קלט
    0.07
     בינלאומי
    0.07
    0.07
    OTION
    0.07
     approvals
    0.07
    Act Density 0.339%

    No Known Activations