INDEX
    Explanations

    negation and expressions that counter claims or expectations

    New Auto-Interp
    Negative Logits
    ybrid
    -0.17
     neither
    -0.16
    à¸ĸ
    -0.16
    lef
    -0.16
     hardly
    -0.15
    ä¸įäºĨ
    -0.15
     no
    -0.15
    eve
    -0.15
     Almost
    -0.15
    affle
    -0.15
    POSITIVE LOGITS
     altogether
    0.24
     stint
    0.22
    alto
    0.19
     mere
    0.17
     greatly
    0.17
     Alto
    0.15
     merely
    0.15
     seriously
    0.15
     strictly
    0.15
    Overall
    0.15
    Act Density 0.577%

    No Known Activations