INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Emacs
    -0.08
     "&
    -0.07
    eyse
    -0.06
     احمد
    -0.06
     competing
    -0.06
    ctest
    -0.06
    _RSP
    -0.06
    =test
    -0.06
    _FEED
    -0.06
     Welfare
    -0.06
    POSITIVE LOGITS
     lifted
    0.07
    나요
    0.07
    бира
    0.07
    -hard
    0.07
     gem
    0.06
    違い
    0.06
    担当
    0.06
     vede
    0.06
    Notifications
    0.06
    아요
    0.06
    Act Density 0.005%

    No Known Activations