INDEX
    Explanations

    phrases related to taking action and making decisions

    repeated references to the word "we."

    New Auto-Interp
    Negative Logits
    REDACTED
    -0.72
     Publication
    -0.71
     gratification
    -0.67
    odor
    -0.66
    ions
    -0.62
     Tai
    -0.59
     Nay
    -0.58
    more
    -0.58
     Eleven
    -0.58
     Rowe
    -0.57
    POSITIVE LOGITS
    've
    1.32
    're
    1.27
    'll
    1.09
    asel
    1.07
     ourselves
    1.07
    'd
    1.05
    athered
    1.03
    IRD
    1.02
    ibo
    0.95
    lder
    0.94
    Act Density 0.258%

    No Known Activations