INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    (numpy
    -0.08
    	byte
    -0.07
    _player
    -0.06
    (pref
    -0.06
    	write
    -0.06
    									  
    -0.06
     кто
    -0.06
     Philips
    -0.06
     satire
    -0.06
     husbands
    -0.06
    POSITIVE LOGITS
    <K
    0.07
    ··
    0.06
    unes
    0.06
    .Cont
    0.06
    /home
    0.06
    ничес
    0.06
    0.06
    yne
    0.06
    ца
    0.06
    Pr
    0.06
    Act Density 0.062%

    No Known Activations