INDEX
Explanations
phrases that prompt the reader to subscribe or sign up
repeated uses of the word "our"
New Auto-Interp
Negative Logits
Izan
-0.81
bender
-0.78
netflix
-0.74
yang
-0.73
ée
-0.73
lessness
-0.71
stood
-0.71
ivas
-0.71
nesota
-0.71
matter
-0.71
POSITIVE LOGITS
selves
1.01
own
0.93
newest
0.88
latest
0.86
respective
0.84
sister
0.81
exclusive
0.81
motto
0.80
editorial
0.79
extensive
0.78
Activations Density 0.068%