INDEX
    Explanations

    words indicating weakness or negative qualities

    New Auto-Interp
    Negative Logits
    cek
    -0.16
    ien
    -0.15
    ers
    -0.15
    iesta
    -0.14
    eka
    -0.14
    ona
    -0.14
    routine
    -0.14
    olib
    -0.14
     +
    -0.13
    ve
    -0.13
    POSITIVE LOGITS
     that
    0.25
     bahwa
    0.25
    that
    0.25
     rằng
    0.24
     että
    0.22
     daÃŁ
    0.21
     dass
    0.20
     že
    0.20
     ÑĩÑĤо
    0.20
    	that
    0.20
    Act Density 0.121%

    No Known Activations