INDEX
    Explanations

    expressions that convey contrast or opposition in statements

    New Auto-Interp
    Negative Logits
    erland
    -0.18
    EMPLARY
    -0.17
    loor
    -0.16
    lew
    -0.16
    824
    -0.15
    atee
    -0.15
    ERO
    -0.15
     neod
    -0.14
    _Tis
    -0.14
    ospace
    -0.14
    POSITIVE LOGITS
     vice
    1.06
     Vice
    0.85
    vice
    0.82
     reverse
    0.65
    reverse
    0.56
     Conversely
    0.55
     converse
    0.54
     Reverse
    0.52
    Reverse
    0.52
    VICE
    0.51
    Act Density 0.126%

    No Known Activations