INDEX
    Explanations

    content related to gender, specifically gender-neutral facilities and legal requirements for bathroom use based on gender identity

    topics related to gender identity and related policies

    New Auto-Interp
    Negative Logits
     Hacker
    -0.64
    %]
    -0.58
     betrayal
    -0.57
     Journalism
    -0.56
     extrap
    -0.56
     ¯
    -0.55
     arrogance
    -0.54
    }:
    -0.53
     surprises
    -0.53
     underest
    -0.53
    POSITIVE LOGITS
    instead
    0.76
    izont
    0.72
     safely
    0.69
    their
    0.67
     lawfully
    0.67
     uninterrupted
    0.66
     instead
    0.66
    disabled
    0.64
    cise
    0.63
    clusive
    0.61
    Act Density 1.120%

    No Known Activations