INDEX
    Explanations

    negative evaluations or judgments about arguments or claims

    New Auto-Interp
    Negative Logits
    ien
    -0.16
    idente
    -0.15
    ve
    -0.15
    cek
    -0.15
    odge
    -0.14
    iversit
    -0.14
     Sy
    -0.14
    ona
    -0.14
    reeze
    -0.14
    olib
    -0.14
    POSITIVE LOGITS
     bahwa
    0.25
     rằng
    0.25
     that
    0.25
    that
    0.23
     että
    0.23
     daÃŁ
    0.20
     dass
    0.20
     że
    0.19
     že
    0.19
     hogy
    0.19
    Act Density 0.122%

    No Known Activations