INDEX
    Explanations

    references to "fake news" and discussions about media trustworthiness

    New Auto-Interp
    Negative Logits
    ollar
    -0.21
    ollo
    -0.16
    utzer
    -0.16
    tick
    -0.15
    ullan
    -0.15
    uze
    -0.15
     Toll
    -0.15
    rame
    -0.15
    athom
    -0.14
     toll
    -0.14
    POSITIVE LOGITS
    é³
    0.14
    æŃ©
    0.14
    290
    0.14
    uiltin
    0.14
     Kral
    0.14
    pec
    0.13
    igkeit
    0.13
     Amp
    0.13
    Qed
    0.13
    unga
    0.13
    Act Density 0.059%

    No Known Activations