INDEX
    Explanations

    contrasting or opposing statements throughout the text

    New Auto-Interp
    Negative Logits
     even
    -0.21
    even
    -0.20
     although
    -0.20
     EVEN
    -0.18
     zwar
    -0.18
     actually
    -0.18
    çĶļèĩ³
    -0.18
     además
    -0.18
     Even
    -0.17
     truly
    -0.17
    POSITIVE LOGITS
     nevertheless
    0.53
     nonetheless
    0.52
    Nevertheless
    0.45
     Nonetheless
    0.37
     Nevertheless
    0.36
    åį´
    0.26
    è¿ĺæĺ¯
    0.24
     certainly
    0.24
    theless
    0.23
     yine
    0.23
    Act Density 0.606%

    No Known Activations