INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     impactful
    -0.08
    一本
    -0.07
     perman
    -0.07
    એક
    -0.07
     elderly
    -0.07
    posts
    -0.07
     bote
    -0.07
    tel
    -0.07
     veterin
    -0.07
    creator
    -0.07
    POSITIVE LOGITS
     механизм
    0.10
    ология
    0.09
    sels
    0.09
     занимается
    0.08
    ೈನ
    0.08
     maatregelen
    0.08
     Maßnahmen
    0.08
     комиссия
    0.08
    措施
    0.08
     технология
    0.08
    Act Density 0.004%

    No Known Activations