INDEX
    Explanations

    mentions of personalized content such as logins, emails, and tasks

    the word "your" and its variations in different contexts

    New Auto-Interp
    Negative Logits
    apo
    -0.94
    forth
    -0.78
     Cohn
    -0.75
     Goes
    -0.72
    Lago
    -0.71
     Originally
    -0.68
     Shapiro
    -0.66
     Epstein
    -0.66
    wik
    -0.65
    aways
    -0.64
    POSITIVE LOGITS
     own
    1.40
     favourite
    1.17
     favorite
    1.07
     adversary
    0.94
    anmar
    0.93
    ocard
    0.92
     desired
    0.89
     opponent
    0.89
     preferred
    0.89
     imagination
    0.88
    Act Density 0.105%

    No Known Activations