INDEX
    Explanations

    themes related to cultural superiority and self-righteousness

    New Auto-Interp
    Negative Logits
    aktu
    -0.17
    allen
    -0.16
    _verbose
    -0.15
    ebo
    -0.15
    ↵↵
    -0.15
    ì£
    -0.15
    ArrayOf
    -0.15
    CLR
    -0.14
    _FT
    -0.14
    ulur
    -0.14
    POSITIVE LOGITS
     self
    0.27
     ego
    0.26
     superiority
    0.26
     arrog
    0.25
     pride
    0.25
     superior
    0.24
     confidence
    0.24
     hub
    0.24
     arrogance
    0.24
     eg
    0.23
    Act Density 0.167%

    No Known Activations