INDEX
    Explanations

    references to political figures and issues related to falsehoods in media

    New Auto-Interp
    Negative Logits
    Telegram
    -0.18
     Telegram
    -0.16
    202
    -0.16
    ĨĴ
    -0.15
    arl
    -0.15
     masks
    -0.15
    deg
    -0.14
    gal
    -0.14
    ï¿
    -0.14
    747
    -0.14
    POSITIVE LOGITS
    icus
    0.18
    ãĥ¼ãĥª
    0.17
    prites
    0.16
    Uvs
    0.16
    dued
    0.15
     Meadow
    0.15
    nicos
    0.14
    htar
    0.14
    icum
    0.14
    apiro
    0.14
    Act Density 0.263%

    No Known Activations