INDEX
    Explanations

    references to news sources, particularly Fox News

    New Auto-Interp
    Negative Logits
    io
    -0.17
    ocks
    -0.16
    ew
    -0.15
    iro
    -0.15
    ira
    -0.15
    ied
    -0.15
    ents
    -0.14
    assis
    -0.14
     Forg
    -0.14
    sk
    -0.14
    POSITIVE LOGITS
    andum
    0.18
    اÙĥÙħ
    0.15
    à¤Ĩप
    0.14
    ãĥ¼ãĥł
    0.14
    зÑĥ
    0.14
    lexport
    0.14
    Ð
    0.14
    ModelProperty
    0.14
     amazon
    0.14
    #__
    0.13
    Act Density 0.003%

    No Known Activations