INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Games
    -0.07
     Bash
    -0.07
    -0.07
    plash
    -0.06
    :d
    -0.06
     relies
    -0.06
     '_'
    -0.06
    عنی
    -0.06
    minecraft
    -0.06
    *e
    -0.06
    POSITIVE LOGITS
     život
    0.07
     cessation
    0.06
     طبي
    0.06
     Abel
    0.06
     obl
    0.06
     scorer
    0.06
    ...\
    0.06
    elper
    0.06
    няв
    0.06
    0.06
    Act Density 0.036%

    No Known Activations