INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    balance
    -0.07
    Ter
    -0.07
     municipal
    -0.07
     urban
    -0.06
     víc
    -0.06
     bre
    -0.06
     SPL
    -0.06
    diff
    -0.06
    Bar
    -0.06
     SB
    -0.06
    POSITIVE LOGITS
    Proto
    0.07
    builtin
    0.07
     공격
    0.07
    ंट
    0.07
     Phó
    0.07
    ��
    0.06
     Wish
    0.06
     warped
    0.06
    TS
    0.06
     provoc
    0.06
    Act Density 0.006%

    No Known Activations