INDEX
    Explanations

    references to liberalism and related ideologies

    New Auto-Interp
    Negative Logits
    antry
    -0.18
    jÃŃm
    -0.18
    anke
    -0.17
    νι
    -0.17
    orch
    -0.16
    ean
    -0.16
    alist
    -0.15
    onet
    -0.15
    iers
    -0.15
    ee
    -0.15
    POSITIVE LOGITS
    ised
    0.33
    ization
    0.32
    isation
    0.31
    ized
    0.31
    izing
    0.29
    ize
    0.28
    ising
    0.27
    ism
    0.27
    izes
    0.27
    ises
    0.23
    Act Density 0.022%

    No Known Activations