INDEX
    Explanations

    references to propaganda efforts and media manipulation

    New Auto-Interp
    Negative Logits
     impro
    -0.15
    esture
    -0.15
     Compression
    -0.14
    ocities
    -0.14
    opard
    -0.14
    981
    -0.13
     peripheral
    -0.13
     Trades
    -0.13
    еÑĢÑĮ
    -0.13
    chwitz
    -0.12
    POSITIVE LOGITS
     propaganda
    0.38
     Prop
    0.38
    prop
    0.37
    -prop
    0.36
     propag
    0.35
     пÑĢоп
    0.34
     PROP
    0.34
    Prop
    0.28
     propagation
    0.28
    PROP
    0.27
    Act Density 0.116%

    No Known Activations