INDEX
    Explanations

    phrases emphasizing contrast or negation

    negations and expressions of denial

    New Auto-Interp
    Negative Logits
    psey
    -0.64
     ce
    -0.62
    often
    -0.61
    typically
    -0.59
     Ens
    -0.56
     Advertisement
    -0.56
     airs
    -0.56
     shaman
    -0.56
    peace
    -0.56
     psi
    -0.56
    POSITIVE LOGITS
    onen
    0.76
    \\\\\\\\
    0.69
    ounters
    0.68
    ibe
    0.67
     benefited
    0.66
    omical
    0.66
    gged
    0.65
    rist
    0.64
    å¤
    0.63
    idon
    0.63
    Act Density 0.262%

    No Known Activations