INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     morals
    -0.09
    learning
    -0.08
     coder
    -0.08
    inanders
    -0.08
    中彩票
    -0.08
     roulette
    -0.08
    ponge
    -0.08
    roulette
    -0.08
    研发
    -0.08
     learning
    -0.08
    POSITIVE LOGITS
    (Sub
    0.09
    (Mock
    0.09
    (S
    0.09
     приг
    0.08
     Northern
    0.08
     Inn
    0.08
    (M
    0.08
     shoulder
    0.08
    (B
    0.08
    Clock
    0.07
    Act Density 0.013%

    No Known Activations