INDEX
    Explanations

    sentences that discuss comparisons and contrasts between entities

    New Auto-Interp
    Negative Logits
     zwar
    -0.17
    elin
    -0.17
    ulumi
    -0.15
    èϽçĦ¶
    -0.15
    vice
    -0.14
    ianne
    -0.14
    볨
    -0.14
    olley
    -0.14
    Ïģιν
    -0.14
    à¸Īะà¹Ħà¸Ķ
    -0.14
    POSITIVE LOGITS
     nonetheless
    0.21
     also
    0.20
     nevertheless
    0.19
     nowhere
    0.17
    also
    0.16
     still
    0.16
    lez
    0.16
    è¿ĺæĺ¯
    0.15
    tera
    0.15
    597
    0.15
    Act Density 0.124%

    No Known Activations