INDEX
    Explanations

    negative descriptors related to arguments or criticisms

    New Auto-Interp
    Negative Logits
    reeze
    -0.16
     bastante
    -0.15
    ers
    -0.15
    olib
    -0.15
    cek
    -0.15
    ien
    -0.15
    ycled
    -0.14
    uku
    -0.14
    wan
    -0.14
    byn
    -0.14
    POSITIVE LOGITS
     that
    0.26
    that
    0.25
     että
    0.20
     daÃŁ
    0.19
     ÑĩÑĤо
    0.19
    	that
    0.19
     nobody
    0.19
     it
    0.18
     że
    0.18
     bahwa
    0.18
    Act Density 0.118%

    No Known Activations