INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     нач
    -0.07
                                                              
    -0.07
    ,为
    -0.07
    Compatibility
    -0.07
    -0.07
    Free
    -0.06
    kara
    -0.06
     своих
    -0.06
     nonce
    -0.06
     использов
    -0.06
    POSITIVE LOGITS
     reduction
    0.09
     reductions
    0.08
     Reduction
    0.08
    ізнес
    0.07
    pickup
    0.07
    >'.↵
    0.07
     ikt
    0.07
    drop
    0.07
    lds
    0.06
     Γκ
    0.06
    Act Density 0.008%

    No Known Activations