INDEX
    Explanations

    terms related to criticism of societal norms and behaviors, particularly focusing on perceived stupidity and hypocrisy

    New Auto-Interp
    Negative Logits
    elper
    -0.16
    째
    -0.16
    _Reset
    -0.14
    648
    -0.14
    orus
    -0.14
     Kind
    -0.14
    408
    -0.14
    884
    -0.14
    ãĥĬãĥ¼
    -0.14
    856
    -0.14
    POSITIVE LOGITS
    оÑģÑĤÑĮ
    0.16
    .Path
    0.15
    GED
    0.15
     Ders
    0.14
    nat
    0.14
    reira
    0.14
    EGA
    0.14
     CONTRIBUTORS
    0.14
    оÑģÑĤи
    0.14
    ul
    0.14
    Act Density 0.416%

    No Known Activations