INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     steroid
    -0.08
    (gt
    -0.06
     Lt
    -0.06
     detective
    -0.06
    "I
    -0.06
    	def
    -0.06
     subst
    -0.06
    938
    -0.06
     stencil
    -0.06
     licz
    -0.06
    POSITIVE LOGITS
    =
    0.09
    ال
    0.08
    ار
    0.08
    +
    0.08
    al
    0.08
     #$
    0.07
    |=
    0.07
    0.07
            
    ↵        
    ↵
    0.07
    0.07
    Act Density 0.039%

    No Known Activations