INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     defeated
    -0.07
     Vuex
    -0.06
     matchup
    -0.06
     knih
    -0.06
    dif
    -0.06
     dinners
    -0.06
     Muham
    -0.06
     rağmen
    -0.06
    -0.06
     Verse
    -0.06
    POSITIVE LOGITS
    َة
    0.08
     dạng
    0.07
     Uttar
    0.06
     Firearms
    0.06
     bản
    0.06
     chính
    0.06
    Expense
    0.06
    Connector
    0.06
    asından
    0.06
    -runtime
    0.06
    Act Density 0.084%

    No Known Activations