INDEX
    Explanations

    preventing something

    New Auto-Interp
    Negative Logits
    ENU
    -0.08
     Dude
    -0.07
    еко
    -0.06
    (T
    -0.06
     disagree
    -0.06
     \↵
    -0.06
     determining
    -0.06
    -0.06
    /part
    -0.06
    -0.06
    POSITIVE LOGITS
    інь
    0.07
    альной
    0.07
    tsx
    0.06
    oma
    0.06
    ंय
    0.06
    kám
    0.06
    ulent
    0.06
    ки
    0.06
    ergy
    0.06
     بین
    0.06
    Act Density 0.055%

    No Known Activations