INDEX
    Explanations

    words related to politics, control, and authority

    concepts related to political and social dynamics

    New Auto-Interp
    Negative Logits
    ymm
    -0.56
    arnaev
    -0.55
    xon
    -0.53
    atom
    -0.52
    ULTS
    -0.52
    onz
    -0.52
    qua
    -0.51
    ãĥ£
    -0.51
    cci
    -0.51
    ortium
    -0.51
    POSITIVE LOGITS
     itself
    0.67
     herself
    0.60
     himself
    0.55
     yourself
    0.54
     peripher
    0.54
     POV
    0.54
     altogether
    0.53
     motif
    0.52
    pedia
    0.52
     entails
    0.51
    Act Density 1.094%

    No Known Activations