INDEX
    Explanations

    statements and actions related to political accountability and critique

    New Auto-Interp
    Negative Logits
     stol
    -0.18
    δά
    -0.17
    celed
    -0.16
     пÑĢизна
    -0.14
    _TD
    -0.14
     ÙģÙĪØª
    -0.14
    annes
    -0.14
     potvr
    -0.14
    ZR
    -0.14
    _FT
    -0.14
    POSITIVE LOGITS
     critique
    0.35
     criticism
    0.31
     reb
    0.30
     critiques
    0.30
     repro
    0.30
     Crit
    0.29
     critic
    0.28
     criticize
    0.28
     crit
    0.27
    crit
    0.27
    Act Density 0.459%

    No Known Activations