INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     чт
    -0.07
     DELETE
    -0.06
    
    -0.06
     DL
    -0.06
     writings
    -0.06
     downloaded
    -0.06
    asics
    -0.06
    ("'
    -0.06
    ilation
    -0.06
     drift
    -0.06
    POSITIVE LOGITS
    _trans
    0.13
    .trans
    0.13
    trans
    0.11
     trans
    0.10
    	trans
    0.10
    .Trans
    0.09
    _Trans
    0.08
    -trans
    0.08
     TRANS
    0.08
    Trans
    0.08
    Act Density 0.007%

    No Known Activations