INDEX
    Explanations

    mentions of or references to fake information or news

    references to "fake news"

    New Auto-Interp
    Negative Logits
    xual
    -0.87
    ires
    -0.75
    ands
    -0.75
    arching
    -0.73
    }}}
    -0.70
    APTER
    -0.70
    anded
    -0.69
    draw
    -0.68
    Thom
    -0.68
    azar
    -0.66
    POSITIVE LOGITS
     pas
    0.80
     fake
    0.79
    ²¾
    0.76
    ument
    0.74
     Fake
    0.72
    ãĥ¼ãĥĨãĤ£
    0.68
    eln
    0.68
    ulously
    0.67
     monster
    0.67
    outs
    0.67
    Act Density 0.022%

    No Known Activations