INDEX
    Explanations

    mentions of fake news and related terms

    references to "fake news" and misinformation

    New Auto-Interp
    Negative Logits
    aird
    -0.75
     dues
    -0.75
    illes
    -0.72
    ktop
    -0.71
    atri
    -0.70
    onding
    -0.68
    foreseen
    -0.68
    airo
    -0.67
    anse
    -0.67
    emale
    -0.66
    POSITIVE LOGITS
    ument
    1.11
     disinformation
    1.02
     misinformation
    1.01
     propag
    1.00
     perpetrated
    1.00
     pedd
    0.98
     concoct
    0.97
     falsehood
    0.97
     nonsense
    0.95
     debunked
    0.93
    Act Density 0.197%

    No Known Activations