INDEX
    Explanations

    positive adjectives

    New Auto-Interp
    Negative Logits
     endereco
    -0.08
    ddit
    -0.08
     tee
    -0.08
     geeign
    -0.08
    -0.07
     Brock
    -0.07
    -либо
    -0.07
     сям
    -0.07
     empleo
    -0.07
    ирования
    -0.07
    POSITIVE LOGITS
     remainder
    0.08
    ٌ
    0.08
    तः
    0.08
     ли
    0.08
    oil
    0.08
     Jack
    0.08
    0.07
     numerical
    0.07
    മാണ്
    0.07
     advers
    0.07
    Act Density 0.201%

    No Known Activations