INDEX
    Explanations

    references to misinformation and false claims surrounding political events

    New Auto-Interp
    Negative Logits
    Âı
    -0.07
    opard
    -0.07
    arkin
    -0.07
    çĬ¬
    -0.06
    민
    -0.06
    annonce
    -0.06
     DEAL
    -0.06
     hypoc
    -0.06
     Moran
    -0.06
    <pre
    -0.06
    POSITIVE LOGITS
    uraa
    0.07
     linking
    0.07
     èģ
    0.06
    uien
    0.06
     Foo
    0.06
    hoo
    0.06
    IH
    0.06
    undo
    0.06
     about
    0.06
     Gard
    0.06
    Act Density 0.042%

    No Known Activations