INDEX
    Explanations

    political criticism

    New Auto-Interp
    Negative Logits
     pleasantly
    -0.09
     roomy
    -0.08
     mentors
    -0.08
    moder
    -0.08
    photos
    -0.07
     mv
    -0.07
     reš
    -0.07
    _ball
    -0.07
     EOS
    -0.07
    awesome
    -0.07
    POSITIVE LOGITS
     propaganda
    0.15
     ruthless
    0.14
     neoliberal
    0.14
     harmful
    0.14
     misguided
    0.14
     blatant
    0.14
    涉嫌
    0.14
    0.13
     unethical
    0.13
     malicious
    0.13
    Act Density 0.305%

    No Known Activations