INDEX
Explanations
mentions of personalized content such as logins, emails, and tasks
the word "your" and its variations in different contexts
New Auto-Interp
Negative Logits
apo
-0.94
forth
-0.78
Cohn
-0.75
Goes
-0.72
Lago
-0.71
Originally
-0.68
Shapiro
-0.66
Epstein
-0.66
wik
-0.65
aways
-0.64
POSITIVE LOGITS
own
1.40
favourite
1.17
favorite
1.07
adversary
0.94
anmar
0.93
ocard
0.92
desired
0.89
opponent
0.89
preferred
0.89
imagination
0.88
Activations Density 0.105%