INDEX
    Explanations

    phrases related to trustworthy news or newsletters

    questions about trustworthy news sources

    New Auto-Interp
    Negative Logits
    ortium
    -0.73
    ioned
    -0.70
    onne
    -0.68
    pherd
    -0.67
    inguished
    -0.66
     subconscious
    -0.65
    edi
    -0.63
     phased
    -0.62
    ignt
    -0.62
    uctor
    -0.62
    POSITIVE LOGITS
    iframe
    0.69
     Airl
    0.68
    BRE
    0.67
     Transcript
    0.67
    taboola
    0.67
     UNHCR
    0.65
     Cookies
    0.64
     Shelter
    0.64
    Afee
    0.63
    Subscribe
    0.63
    Act Density 0.059%

    No Known Activations