INDEX
    Explanations

    instances of advertising content

    New Auto-Interp
    Negative Logits
     tant
    -0.58
    amen
    -0.56
    ë
    -0.56
    istically
    -0.55
    teenth
    -0.54
    é¾įåĸļ士
    -0.54
     toughness
    -0.53
     prob
    -0.53
     Dull
    -0.52
    mans
    -0.52
    POSITIVE LOGITS
    <|endoftext|>
    1.21
     Advertisement
    0.91
    qus
    0.83
    Advertisements
    0.82
     Provided
    0.81
    Comments
    0.79
     Subscribe
    0.79
     Posts
    0.77
     Helpful
    0.76
     Comments
    0.74
    Act Density 0.041%

    No Known Activations