INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.07
    ++){
    ↵
    -0.06
     Dok
    -0.06
    __↵
    -0.06
     obedience
    -0.06
    ettel
    -0.06
    ADIO
    -0.06
    .setVertical
    -0.06
    ccb
    -0.06
     Trails
    -0.06
    POSITIVE LOGITS
     irc
    0.07
    xFFFFFFFF
    0.07
    0.07
     arise
    0.07
     backups
    0.06
     багат
    0.06
    (remove
    0.06
    pack
    0.06
     mm
    0.06
     discrimination
    0.06
    Act Density 0.001%

    No Known Activations