INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Damage
    -0.07
     explanation
    -0.07
    -solving
    -0.07
     Fav
    -0.06
     IsValid
    -0.06
     belir
    -0.06
     Accred
    -0.06
    -0.06
    -project
    -0.06
     annum
    -0.06
    POSITIVE LOGITS
     خرد
    0.06
     Appears
    0.06
     těž
    0.06
     hard
    0.06
    SYM
    0.06
     tort
    0.06
     انت
    0.06
    HG
    0.06
     muse
    0.06
     hai
    0.05
    Act Density 0.015%

    No Known Activations