INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .gb
    -0.08
     CLOCK
    -0.07
    pliers
    -0.07
    ...)
    -0.06
    PAL
    -0.06
    89
    -0.06
    ädchen
    -0.06
     обеспеч
    -0.06
     nicely
    -0.06
     생활
    -0.06
    POSITIVE LOGITS
    _compress
    0.07
    енью
    0.07
     unemployment
    0.06
    same
    0.06
    0.06
    asm
    0.06
    umat
    0.06
    Liked
    0.06
     stem
    0.06
    (delete
    0.06
    Act Density 0.001%

    No Known Activations