INDEX
    Explanations

    words related to beliefs, stereotypes, and societal issues

    beliefs or stereotypes surrounding masculinity and gender roles

    New Auto-Interp
    Negative Logits
     sidx
    -0.74
    cussion
    -0.70
    ovember
    -0.69
    cember
    -0.68
    arthed
    -0.67
    laughs
    -0.67
    Multiple
    -0.67
    mentioned
    -0.67
    adel
    -0.66
    ftime
    -0.65
    POSITIVE LOGITS
     somehow
    1.16
     magically
    1.11
     infall
    1.10
     immutable
    1.09
     superiority
    1.03
     invincible
    1.00
     innate
    1.00
     inherently
    0.99
     virtuous
    0.96
     benevolent
    0.94
    Act Density 0.737%

    No Known Activations