INDEX
    Explanations

    pronouns 'we' and 'our'

    references to collective responsibility or shared experiences

    New Auto-Interp
    Negative Logits
    REDACTED
    -0.77
     Publication
    -0.66
     gratification
    -0.64
    odor
    -0.64
     Crush
    -0.61
    cum
    -0.60
    Owner
    -0.60
    personal
    -0.59
     Hole
    -0.58
     Levine
    -0.58
    POSITIVE LOGITS
    're
    1.22
    've
    1.21
    'll
    0.99
    athered
    0.98
    akening
    0.98
    asel
    0.96
     ourselves
    0.95
    ird
    0.93
    lder
    0.92
    IRD
    0.92
    Act Density 0.239%

    No Known Activations