INDEX
    Explanations

    contrasts or negations in statements

    New Auto-Interp
    Negative Logits
    uron
    -0.16
    रण
    -0.15
    venes
    -0.15
    aru
    -0.14
    atel
    -0.14
    esty
    -0.14
    enin
    -0.14
    qing
    -0.14
     boarding
    -0.14
    rong
    -0.14
    POSITIVE LOGITS
    esser
    0.16
    ape
    0.16
    구
    0.15
    ãĥĸãĥŃ
    0.14
    Ïģο
    0.14
     ÐĿаÑģ
    0.14
    596
    0.14
    ắc
    0.14
    thon
    0.14
     Lever
    0.14
    Act Density 0.160%

    No Known Activations