INDEX
    Explanations

    mentions of the term "fake"

    references to "fake news."

    New Auto-Interp
    Negative Logits
    xual
    -0.90
    arching
    -0.80
    riott
    -0.78
    onen
    -0.76
    hem
    -0.73
     guiActiveUnfocused
    -0.73
    ands
    -0.70
    azard
    -0.68
    bard
    -0.68
    pour
    -0.68
    POSITIVE LOGITS
    ument
    0.98
     pas
    0.87
     IDs
    0.84
     news
    0.76
     positives
    0.74
    ulent
    0.73
     tan
    0.71
    ulously
    0.69
     sounding
    0.67
     identities
    0.65
    Act Density 0.070%

    No Known Activations