INDEX
    Explanations

    contradictory statements or contrasting ideas

    New Auto-Interp
    Negative Logits
    meta
    -0.66
    coat
    -0.65
    zero
    -0.65
    asks
    -0.61
    nat
    -0.60
    emn
    -0.59
    ILLE
    -0.58
    und
    -0.58
    unc
    -0.58
    ory
    -0.58
    POSITIVE LOGITS
     rather
    1.71
    rather
    1.39
     instead
    1.22
     Rather
    1.20
     merely
    1.08
     nevertheless
    1.07
     nonetheless
    1.01
    Rather
    1.00
     suffice
    0.97
    Instead
    0.96
    Act Density 0.085%

    No Known Activations