INDEX
    Explanations

    discussions around political claims and misinformation

    New Auto-Interp
    Negative Logits
    ree
    -0.17
    lias
    -0.16
    ouv
    -0.16
    00
    -0.15
     Otto
    -0.14
    0
    -0.14
    ittings
    -0.14
    nier
    -0.14
    og
    -0.14
    kin
    -0.14
    POSITIVE LOGITS
    ksam
    0.19
    atik
    0.18
    ailability
    0.16
    ienza
    0.15
     å½±
    0.15
     Wid
    0.15
    repos
    0.15
    ometown
    0.14
     Jad
    0.14
    _ctxt
    0.14
    Act Density 0.278%

    No Known Activations