INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     derivative
    -0.07
    Granted
    -0.07
     nev
    -0.07
     disg
    -0.06
    iform
    -0.06
    estroy
    -0.06
     shuffle
    -0.06
     Westbrook
    -0.06
     minimal
    -0.06
     shaving
    -0.06
    POSITIVE LOGITS
    леж
    0.08
    ité
    0.06
    0.06
     navegador
    0.06
     agenda
    0.06
     زیست
    0.06
     Koch
    0.06
    Some
    0.06
     abound
    0.06
    	glm
    0.06
    Act Density 0.016%

    No Known Activations