INDEX
    Explanations

    expressions related to nationalism and identity

    New Auto-Interp
    Negative Logits
    enko
    -0.14
     Tier
    -0.14
    eree
    -0.14
     feminist
    -0.14
    ltre
    -0.14
     Bias
    -0.14
    kova
    -0.14
     Femin
    -0.14
    ubu
    -0.14
    earer
    -0.13
    POSITIVE LOGITS
     identity
    0.36
    identity
    0.34
     Identity
    0.32
    Identity
    0.29
    -national
    0.29
     nationalism
    0.27
     national
    0.27
    national
    0.27
     identities
    0.26
     nation
    0.26
    Act Density 0.127%

    No Known Activations