INDEX
    Explanations

    contradictions or contrasting statements

    New Auto-Interp
    Negative Logits
     zwar
    -0.93
     sice
    -0.72
     оригіналу
    -0.68
     übrigens
    -0.68
     even
    -0.64
     além
    -0.62
    だけでなく
    -0.61
    even
    -0.60
    はもちろん
    -0.60
     nejen
    -0.60
    POSITIVE LOGITS
     nonetheless
    1.97
     nevertheless
    1.90
     dennoch
    1.32
    それでも
    1.27
    Nonetheless
    1.22
     trotzdem
    1.20
    Nevertheless
    1.17
     néanmoins
    1.17
     зато
    1.16
     ändå
    1.16
    Act Density 0.372%

    No Known Activations