INDEX
    Explanations

    mentions of subscribing to newsletters

    instances of the word "our"

    New Auto-Interp
    Negative Logits
    bender
    -0.83
    netflix
    -0.77
    yang
    -0.77
    dn
    -0.73
    edi
    -0.72
    FU
    -0.71
    lessness
    -0.69
    matter
    -0.68
    stood
    -0.68
     Izan
    -0.68
    POSITIVE LOGITS
    selves
    1.07
     own
    1.01
     respective
    0.89
     newest
    0.88
     handy
    0.87
     latest
    0.87
     exclusive
    0.82
     inbox
    0.81
     motto
    0.80
     sister
    0.79
    Act Density 0.068%

    No Known Activations