INDEX
    Explanations

    phrases indicating negation or contradiction

    negations or expressions of denial

    New Auto-Interp
    Negative Logits
    psey
    -0.69
    often
    -0.64
     purportedly
    -0.62
     progressively
    -0.61
    ĻĤ
    -0.60
     supposedly
    -0.58
     allegedly
    -0.57
    usually
    -0.57
     dunno
    -0.57
     Advertisement
    -0.56
    POSITIVE LOGITS
    rist
    0.73
    onen
    0.72
    osher
    0.67
    å¤
    0.66
    sson
    0.64
    ãĤ¤ãĥĪ
    0.64
    olics
    0.63
    ãĤª
    0.63
     Monteneg
    0.62
    von
    0.62
    Act Density 0.264%

    No Known Activations