INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     attacking
    -0.08
     pred
    -0.07
     attack
    -0.07
    _minus
    -0.06
    ot
    -0.06
    _DEF
    -0.06
    -0.06
     PropertyValue
    -0.06
    xfff
    -0.06
     Dud
    -0.06
    POSITIVE LOGITS
    cm
    0.08
    ecome
    0.07
    izmet
    0.07
    orarily
    0.07
     کمی
    0.07
     cm
    0.07
     cellular
    0.07
    0.07
    imleri
    0.07
    urm
    0.07
    Act Density 0.006%

    No Known Activations