INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     automát
    -0.11
    orneys
    -0.10
     originals
    -0.09
     nors
    -0.09
     yourselves
    -0.09
    elow
    -0.09
    ******č\n
    -0.09
    ä½łä»¬
    -0.08
    αιν
    -0.08
     ìŀĺ
    -0.08
    POSITIVE LOGITS
     other
    0.28
     various
    0.22
    other
    0.20
     several
    0.18
     many
    0.18
    åħ¶ä»ĸ
    0.18
     Other
    0.18
     different
    0.17
     numerous
    0.16
     others
    0.16
    Act Density 0.332%

    No Known Activations