INDEX
    Explanations

    words related to actions with significant impact or consequences

    actions related to manipulation or influence

    New Auto-Interp
    Negative Logits
    ãĤ´ãĥ³
    -0.80
    ffe
    -0.70
     ?)
    -0.70
    tesy
    -0.67
    option
    -0.67
     wink
    -0.66
    unny
    -0.63
     pse
    -0.62
     âĺ
    -0.62
     Unsure
    -0.61
    POSITIVE LOGITS
     unsuspecting
    0.93
    uate
    0.84
     unwanted
    0.77
    enance
    0.74
     various
    0.72
     incoming
    0.71
     unwitting
    0.70
     passers
    0.68
     certain
    0.68
     alleged
    0.67
    Act Density 0.390%

    No Known Activations