INDEX
    Explanations

    phrases indicating contrast or negation

    references to significant events or facts that are often exaggerated or misrepresented

    New Auto-Interp
    Negative Logits
    WT
    -0.73
    acca
    -0.67
    arine
    -0.66
    igate
    -0.66
    agos
    -0.63
    ecast
    -0.61
    inas
    -0.60
     Travels
    -0.60
     Dialogue
    -0.58
    ukong
    -0.58
    POSITIVE LOGITS
     nonetheless
    1.57
     nevertheless
    1.35
    etheless
    1.02
     still
    0.82
     retained
    0.75
     darn
    0.74
     strangely
    0.73
     awfully
    0.72
     proble
    0.71
     undeniably
    0.70
    Act Density 1.184%

    No Known Activations