INDEX
    Explanations

    align-self, align-content

    New Auto-Interp
    Negative Logits
    coding
    -0.07
     fight
    -0.07
     seizing
    -0.07
     fights
    -0.07
    UC
    -0.07
    𝓬
    -0.07
    𝑣
    -0.07
     yaptı
    -0.07
    -0.07
     invading
    -0.07
    POSITIVE LOGITS
     Nass
    0.07
    0.07
     QtGui
    0.06
    ]]:↵
    0.06
    fixed
    0.06
    0.06
     toilet
    0.06
    0.06
     Rivera
    0.06
    !(↵
    0.06
    Act Density 0.003%

    No Known Activations