INDEX
    Explanations

    phrases related to contrasting statements or clauses

    New Auto-Interp
    Negative Logits
    è¦ļéĨĴ
    -0.62
    sth
    -0.62
    ode
    -0.61
    eers
    -0.60
    cial
    -0.58
    ttes
    -0.58
    nexus
    -0.57
    ://
    -0.57
    ruction
    -0.56
    ze
    -0.54
    POSITIVE LOGITS
     unlike
    1.19
     alas
    1.16
     contrary
    1.12
     despite
    1.09
     although
    1.05
     hey
    1.04
    despite
    0.99
     barring
    0.97
     unsurprisingly
    0.96
     unfortunately
    0.95
    Act Density 0.085%

    No Known Activations