INDEX
    Explanations

    advertisements within text

    instances of advertisements or promotional content

    New Auto-Interp
    Negative Logits
    ĪĴ
    -0.87
    eele
    -0.66
     vigilance
    -0.63
    quartered
    -0.61
    ħĭ
    -0.61
    roit
    -0.61
    sole
    -0.61
     defe
    -0.60
     resilience
    -0.60
    alist
    -0.59
    POSITIVE LOGITS
    é¾įå
    0.76
    Featured
    0.74
    advertisement
    0.72
     Videos
    0.71
     VIDEOS
    0.68
    Ïģ
    0.66
    EH
    0.63
    Spoiler
    0.63
    adish
    0.63
    taboola
    0.60
    Act Density 0.042%

    No Known Activations