INDEX
    Explanations

    contrastive conjunctions and phrases indicating an opposing viewpoint

    New Auto-Interp
    Negative Logits
     Zwar
    -0.89
     كومونز
    -0.83
     snippetHide
    -0.81
     Mentre
    -0.76
     Obwohl
    -0.72
     للمعارف
    -0.72
     فريبيس
    -0.70
     Embora
    -0.67
    AutoresizingMask
    -0.66
     embora
    -0.64
    POSITIVE LOGITS
     hey
    1.29
     Hey
    0.76
     HEY
    0.75
    Hey
    0.75
     considering
    0.73
     nonetheless
    0.73
    hey
    0.73
     I
    0.69
     heck
    0.67
     eh
    0.67
    Act Density 0.155%

    No Known Activations