INDEX
Explanations
words related to advertising or promotions
content related to advertisements
New Auto-Interp
Negative Logits
assic
-0.71
tant
-0.68
istically
-0.66
inj
-0.60
bis
-0.60
teenth
-0.60
marsh
-0.59
amen
-0.59
Dull
-0.57
kaya
-0.57
POSITIVE LOGITS
<|endoftext|>
1.00
qus
0.91
Comments
0.85
Advertisements
0.79
edin
0.74
Advertisement
0.74
VERTISEMENT
0.74
Share
0.73
Provided
0.72
Skip
0.72
Activations Density 0.014%