INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    אק
    -0.08
     reich
    -0.08
    -0.08
     ARISING
    -0.07
     Addition
    -0.07
    א
    -0.07
     posters
    -0.07
     Inform
    -0.07
     indist
    -0.07
     Werte
    -0.07
    POSITIVE LOGITS
     grip
    0.12
    Grip
    0.10
     گرفت
    0.10
     released
    0.10
     Grip
    0.10
     releasing
    0.10
     gracefully
    0.10
    释放
    0.09
    released
    0.09
    .release
    0.09
    Act Density 0.008%

    No Known Activations