INDEX
Explanations
articles or sections in a document that are followed by advertisements
instances of advertisements or promotional content
New Auto-Interp
Negative Logits
veland
-0.70
accus
-0.69
anus
-0.66
abusing
-0.66
homebrew
-0.63
aur
-0.62
ynchronous
-0.61
Emin
-0.61
xual
-0.60
mbuds
-0.59
POSITIVE LOGITS
SPONSORED
0.84
Advertisement
0.78
VERTISEMENT
0.77
Space
0.72
ãĤ¨ãĥ«
0.72
Story
0.71
Related
0.70
ILE
0.69
JUST
0.69
Layer
0.69
Activations Density 0.033%