INDEX
Explanations
phrases indicating self-promotion or promotion of one's own content
references to personal or community-related projects and content
New Auto-Interp
Negative Logits
needed
-0.77
esters
-0.76
belongs
-0.75
payers
-0.75
oke
-0.75
forces
-0.74
arians
-0.74
rays
-0.74
okemon
-0.72
ype
-0.71
POSITIVE LOGITS
own
1.07
latest
0.99
blog
0.98
inaugural
0.94
newest
0.94
introductory
0.93
exhaustive
0.90
upcoming
0.89
portfolio
0.89
guide
0.88
Activations Density 0.232%