INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    'It
    -0.07
     שמ
    -0.07
    -0.07
    =image
    -0.07
    -0.06
     atenção
    -0.06
    azor
    -0.06
    糟糕
    -0.06
     Listen
    -0.06
    在职
    -0.06
    POSITIVE LOGITS
    аст
    0.08
    ustrial
    0.07
    subscriber
    0.06
    ankind
    0.06
     wür
    0.06
    (',',
    0.06
    -w
    0.06
     societal
    0.06
    anc
    0.06
     partic
    0.06
    Act Density 0.003%

    No Known Activations