INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     spel
    -0.07
    			  
    -0.07
    _arg
    -0.07
      	 
    -0.06
    jekt
    -0.06
    -0.06
    -0.06
    gium
    -0.06
    qw
    -0.06
    قى
    -0.06
    POSITIVE LOGITS
     swapped
    0.07
    _reduce
    0.07
     hend
    0.06
     undercut
    0.06
     Floral
    0.06
     δημο
    0.06
    termin
    0.06
    0.06
    iationException
    0.06
     Decoder
    0.06
    Act Density 0.007%

    No Known Activations