INDEX
    Explanations

    phrases indicating contrast or disagreement

    signals or markers indicating the beginning or end of a text segment

    New Auto-Interp
    Negative Logits
    代
    -0.72
    pecially
    -0.69
    Unit
    -0.60
    senal
    -0.58
    ãĤ©
    -0.58
    renheit
    -0.56
    omever
    -0.55
    è¦ļéĨĴ
    -0.55
    ular
    -0.55
    However
    -0.55
    POSITIVE LOGITS
     nonetheless
    1.12
     nevertheless
    0.90
    etheless
    0.80
     still
    0.64
     scept
    0.64
     anyway
    0.64
     beware
    0.64
     undeniable
    0.62
     doubts
    0.62
     caveats
    0.60
    Act Density 0.644%

    No Known Activations