INDEX
    Explanations

    mentions of different ideologies

    references to various ideologies

    New Auto-Interp
    Negative Logits
    de
    -0.75
    upon
    -0.73
     Mamm
    -0.72
    ells
    -0.72
    ten
    -0.71
    EVA
    -0.70
    shall
    -0.69
    upper
    -0.67
    hap
    -0.67
    teen
    -0.67
    POSITIVE LOGITS
     ideology
    1.09
     indoctr
    0.96
    eering
    0.92
     theorist
    0.89
     guiActiveUn
    0.89
     ideologies
    0.83
     affiliation
    0.83
     underpin
    0.81
    ologue
    0.80
    ologies
    0.80
    Act Density 0.014%

    No Known Activations