INDEX
    Explanations

    contradictions or contrasting points within the text

    New Auto-Interp
    Negative Logits
     namelijk
    -0.61
    よいよ
    -0.58
     zove
    -0.54
    いよいよ
    -0.53
    Called
    -0.52
    </caption>
    -0.52
    ordnen
    -0.51
    -0.51
     prostu
    -0.50
     Ouvrez
    -0.50
    POSITIVE LOGITS
     nevertheless
    0.97
     nonetheless
    0.93
     still
    0.88
     but
    0.82
     dennoch
    0.81
     trotzdem
    0.79
     But
    0.78
     Nonetheless
    0.77
    Nonetheless
    0.77
     Nevertheless
    0.76
    Act Density 0.322%

    No Known Activations