INDEX
    Explanations

    advertisements within text, as indicated by the consistent high activations for the word "Advertisement."

    various occurrences of advertisements

    New Auto-Interp
    Negative Logits
    wcs
    -0.71
     manif
    -0.66
     nonviolent
    -0.64
     integrity
    -0.62
     servicing
    -0.61
     overcoming
    -0.61
     bip
    -0.59
    vert
    -0.59
     thrill
    -0.54
    gra
    -0.53
    POSITIVE LOGITS
    theless
    1.00
     Advertisement
    0.89
    itto
    0.65
    olicy
    0.64
    RFC
    0.64
     }}
    0.62
    acters
    0.62
    ulhu
    0.59
     Reese
    0.59
    istani
    0.59
    Act Density 0.030%

    No Known Activations