INDEX
    Explanations

    words related to social issues or societal concerns

    references to social issues and concepts

    New Auto-Interp
    Negative Logits
    nces
    -0.93
    xual
    -0.80
    20439
    -0.71
    1001
    -0.69
    ller
    -0.67
    ï¸ı
    -0.66
     Centauri
    -0.65
    butt
    -0.65
    zzle
    -0.65
     Blossom
    -0.65
    POSITIVE LOGITS
    ized
    0.92
    izing
    0.90
    istic
    0.89
    ization
    0.83
     norms
    0.83
     democr
    0.82
    ised
    0.82
     welfare
    0.80
    ists
    0.80
    izes
    0.79
    Act Density 0.026%

    No Known Activations