INDEX
    Explanations

    mentions of "our" or possessive forms in a text

    expressions of gratitude and appreciation

    New Auto-Interp
    Negative Logits
    puff
    -0.78
    tar
    -0.77
    bender
    -0.77
    icter
    -0.75
    atican
    -0.70
    conom
    -0.68
     REUTERS
    -0.68
    more
    -0.68
    netflix
    -0.67
     contradicts
    -0.66
    POSITIVE LOGITS
    selves
    1.47
     own
    1.22
     respective
    0.98
     collective
    0.97
     selves
    0.94
     asses
    0.93
     adversaries
    0.90
     dear
    0.90
     motto
    0.86
     beloved
    0.84
    Act Density 0.123%

    No Known Activations