INDEX
    Explanations

    elements related to societal control and censorship

    New Auto-Interp
    Negative Logits
    orsch
    -0.15
    jed
    -0.14
     chung
    -0.13
    ereg
    -0.13
     Patch
    -0.13
    .gz
    -0.13
    жд
    -0.13
    еÑī
    -0.13
    важа
    -0.13
    ÏĮÏĦηÏĦα
    -0.12
    POSITIVE LOGITS
     dared
    0.33
     daring
    0.32
     dissent
    0.32
     dare
    0.32
     upp
    0.29
     disple
    0.28
     inconvenient
    0.27
     challenge
    0.27
    æķ¢
    0.26
     disagree
    0.25
    Act Density 0.201%

    No Known Activations