INDEX
    Explanations

    phrases indicating contrast or exceptions

    instances of the word "although."

    New Auto-Interp
    Negative Logits
    Eye
    -0.76
    ais
    -0.70
    ized
    -0.69
    hal
    -0.69
    ledged
    -0.68
    edu
    -0.67
    Ing
    -0.67
    lean
    -0.67
    elle
    -0.66
    tnc
    -0.66
    POSITIVE LOGITS
    soever
    0.87
    yip
    0.86
    thood
    0.79
     acknowledging
    0.78
    terness
    0.76
    netflix
    0.73
     conced
    0.72
     agreeing
    0.71
    userc
    0.70
    REDACTED
    0.70
    Act Density 0.013%

    No Known Activations