INDEX
    Explanations

    mentions of different ideologies and their related concepts

    terms related to ideology and its various manifestations

    New Auto-Interp
    Negative Logits
    ells
    -0.90
     Mamm
    -0.81
    FACE
    -0.74
    rooms
    -0.74
    tub
    -0.74
    ibli
    -0.73
    backs
    -0.72
    theless
    -0.71
    teen
    -0.71
    ilet
    -0.67
    POSITIVE LOGITS
     ideology
    0.99
     indoctr
    0.93
     affiliation
    0.89
    eering
    0.87
     theorist
    0.83
     theoret
    0.83
     ide
    0.82
     guiActiveUn
    0.81
     ideologies
    0.80
     purity
    0.79
    Act Density 0.017%

    No Known Activations