INDEX
    Explanations

    mentions of fake news

    New Auto-Interp
    Negative Logits
    xual
    -0.85
    Marginal
    -0.73
    pai
    -0.73
    hem
    -0.72
    endez
    -0.69
    night
    -0.69
    Reviewer
    -0.68
    served
    -0.67
    waukee
    -0.66
     sqor
    -0.66
    POSITIVE LOGITS
     news
    1.00
     IDs
    0.96
    ument
    0.96
     NEWS
    0.87
     identities
    0.81
     pas
    0.80
     positives
    0.80
    tails
    0.80
     News
    0.78
    news
    0.73
    Act Density 0.074%

    No Known Activations