INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ITY
    -0.08
     crap
    -0.07
    oooo
    -0.07
     parl
    -0.07
    ιώ
    -0.07
     ***
    -0.07
     magazines
    -0.06
    989
    -0.06
    [y
    -0.06
     albums
    -0.06
    POSITIVE LOGITS
     advocating
    0.07
    业务
    0.06
     Вар
    0.06
     UE
    0.06
     criminals
    0.06
     advocate
    0.06
    .align
    0.06
    WP
    0.06
     cmp
    0.06
    اعدة
    0.06
    Act Density 0.007%

    No Known Activations