INDEX
    Explanations

    themes related to oppression and societal control

    New Auto-Interp
    Negative Logits
    jist
    -0.15
    UNUSED
    -0.15
    ügen
    -0.15
    eldo
    -0.14
    ilig
    -0.14
    odu
    -0.13
    kest
    -0.13
    жд
    -0.13
    ingo
    -0.13
    num
    -0.13
    POSITIVE LOGITS
     perceived
    0.28
     daring
    0.28
     dared
    0.28
    æķ¢
    0.26
     slightest
    0.26
     dissent
    0.25
     critical
    0.24
     upp
    0.23
     outspoken
    0.23
    critical
    0.22
    Act Density 0.350%

    No Known Activations