INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Educ
    -0.07
     CST
    -0.07
     NC
    -0.07
     enthusiast
    -0.07
     nc
    -0.07
    _CREAT
    -0.07
     تکن
    -0.07
     cet
    -0.07
    getC
    -0.07
     Isle
    -0.07
    POSITIVE LOGITS
     wrong
    0.13
    wrong
    0.09
     WRONG
    0.08
    Wrong
    0.08
     Wrong
    0.08
     Left
    0.07
     Wonder
    0.07
     wrongly
    0.07
    _WRONG
    0.07
     drawbacks
    0.07
    Act Density 0.016%

    No Known Activations